The changes in question have been merged to the master branch. We have just started the release process for datasketches-cpp (version 4.1.0). Once this is done, we will start the release process for datasketches-postgress 1.6.0. In the meantime you may want to try the latest code with the latest datasketches-cpp from the master branch.
On Wed, Apr 19, 2023 at 12:58 AM Jon Malkin <[email protected]> wrote: > As noted in the linked issue, the postgresql 1.5 package is compatible > with the cpp 3.x line, not 4.x. It should work fine with the last > datasketches-cpp 3.x release. > > In the meantime, as noted, we are actively trying to work on speed > improvements for HLL as requested at the start of this thread. > > Additionally, one thing that can help speed releases is to vote whenever > there's a vote announcement -- even a non-binding vote is valuable! > > jon > > On Wed, Apr 19, 2023, 12:13 AM Bhowmick, Rima <[email protected]> > wrote: > >> Hello All, >> >> We are trying to install new version of datasketches in our postgres >> instance. I have downloaded datasketches-postgresql 1.5.0 >> (apache-datasketches-postgresql-1.5.0-src.zip), datasketches-cpp 4.0.1 >> (apache-datasketches-cpp-4.0.1-src.zip) from apache website and boost >> 1.81.0. I have followed the same steps as mentioned in the readme file. >> While executing the make command, I faced an error: >> >> g++ -Wall -Wpointer-arith -Wendif-labels -Wmissing-format-attribute >> -Wformat-security -fno-strict-aliasing -fwrapv -O2 -std=c++11 -fPIC -fPIC >> -I/usr/local/include -Iboost -Idatasketches-cpp/common/include >> -Idatasketches-cpp/kll/include -Idatasketches-cpp/cpc/include >> -Idatasketches-cpp/theta/include -Idatasketches-cpp/fi/include >> -Idatasketches-cpp/hll/include -Idatasketches-cpp/tuple/include >> -Idatasketches-cpp/req/include -I. -I./ >> -I/pgbin/mbi1d/12.x/include/postgresql/server >> -I/pgbin/mbi1d/12.x/include/postgresql/internal -D_GNU_SOURCE >> -I/pgbin/mbi1d/12.x//include/libxml2 -c -o >> src/kll_float_sketch_c_adapter.o src/kll_float_sketch_c_adapter.cpp >> src/kll_float_sketch_c_adapter.cpp:26:109: error: wrong number of >> template arguments (4, should be 3) >> typedef datasketches::kll_sketch<float, std::less<float>, >> datasketches::serde<float>, palloc_allocator<float>> kll_float_sketch; >> >> ^ >> In file included from src/kll_float_sketch_c_adapter.cpp:24:0: >> datasketches-cpp/kll/include/kll_sketch.hpp:158:7: error: provided for >> ‘template<class T, class C, class A> class datasketches::kll_sketch’ >> class kll_sketch { >> >> Looks like there is a mismatch of arguments in >> kll_float_sketch_c_adapter.cpp and kll_sketch.hpp. >> Could you please suggest a solution. Thank you! >> >> https://github.com/apache/datasketches-postgresql/issues/62 >> <https://urldefense.com/v3/__https://github.com/apache/datasketches-postgresql/issues/62__;!!Op6eflyXZCqGR5I!AXYYf_BpeznMsFEbt8pJ4V5PV7QlzoTCJBji7ph7ERc1GUSjX1JBNUm6yS8ThWoqZNtMlh5R5l4DZo9-Lw$> >> >> *Datasketches Distinct count postgres extension algorithm is used in our >> applications to get very prominent business value, therefor if we cannot >> upgrade the versions, it would be a bigg loss for us.* >> >> *Could you please guide us what could be the best approach to overcome >> this?* >> >> >> >> Thanks, >> >> Rima Bhowmick. >> >> >> >> *From: *Alexander Saydakov <[email protected]> >> *Reply to: *"[email protected]" <[email protected]> >> *Date: *Saturday, 15 April 2023 at 12:05 AM >> *To: *"[email protected]" <[email protected]> >> *Subject: *Re: [E] Postgres HLL is very slow >> >> >> >> I am not sure about the date. I think the development should take a few >> days. A formal Apache release will take substantially more time just to go >> through the required steps of voting for the core library release (not >> really necessary for the parallel execution, but necessary to bring the >> latest speed improvements into PostgreSQL extension), and then going >> through the same procedure to release the extension. >> >> Of course, you don't have to wait for the formal release to start testing. >> >> Could you clarify your issues building the latest version please? I >> believe that the datasketches-postgresql code in the master branch is >> compatible with the latest datasketches-cpp code. >> >> >> >> On Fri, Apr 14, 2023 at 11:22 AM Bhowmick, Rima <[email protected]> >> wrote: >> >> Hello Alexander, >> >> >> >> Do you have any date in mind, for releasing the same to have parallel >> execution? >> >> Also we tried upgrading datasketches version from latest documentation, >> we are getting lot of C++ version issues. >> >> Its very tough to install the new version. Any thoughts? >> >> >> >> Thanks, >> >> Rima Bhowmick. >> >> >> >> *From: *Alexander Saydakov <[email protected]> >> *Reply-To: *"[email protected]" <[email protected]> >> *Date: *Friday, 14 April 2023 at 10:58 PM >> *To: *"[email protected]" <[email protected]> >> *Subject: *Re: [E] Postgres HLL is very slow >> >> >> >> Hi Rima, >> >> I am working on the datasketches extension to support parallel queries >> (distributed aggregation). >> >> I expect to get this done in a matter of days. >> >> Also we have just made some improvements to HLL merge speed in the core >> library. These changes were not released yet, but available in the master >> branch. >> >> We have another HLL performance improvement in mind. I will work on it >> once I finish the parallel query support. >> >> >> >> >> >> On Fri, Apr 14, 2023 at 3:33 AM Bhowmick, Rima <[email protected]> >> wrote: >> >> Hello Team, >> >> >> >> Here is the snapshot of the existing application: >> >> >> >> TechStack: Postgres DB, Hive, Tableau UI >> >> Postgres Plugin: DataSketches >> >> >> >> Flow in brief: >> >> - Hadoop Data pipeline job pushes pre-aggregated(using hive >> datasketches algo) active card data, along with other details to Hive. >> - Another job populates that data to Postgres DB, finally having 3 >> years data of 4 regions for multiple countries. >> - Tableau dashboard having live connection to Postgres DB. >> - Tableau Query calling Postgres DB, to aggregate the >> binary/pre-aggregated data to get distinct card count (using DataSketches >> algorithm) and fetch data based on multiple filter conditions. >> - Usually data would be of 3yrs for the span of 2 months, means total >> 6 months of data to aggregate for a country on multiple conditions. >> >> >> >> Usually this aggregation query response is quite slow. We have tried lot >> of different ways to resolve this, >> >> >> >> Mainly datasketches part is making most of the time in execution. >> >> >> >> Thanks & Regards, >> >> Rima Bhowmick >> >> Marketing Brand Analytics >> >> [image: Logo Description automatically generated] >> >>
