Hi, Fred, Yes, ibis::table::select should be the best option for your use case. With 20M rows, most of the queries should complete in a second or less. Let us know if you encounter any performance problems.
John On 9/18/14 6:56 AM, Fred Oko wrote: > We currently use fastbit in read-only mode from our server interface > (currently in Java) so the aspect of "requires data to be on disk > already" is our norm as we ETL from our raw data into fastbit index > before querying (if I am correct in assuming that is all that you > meant). So for now the C API over JNI provides most of what we need > except for the optimal means of aggregate queries and some occasional > concurrency issues not yet fully debugged. However, I recognize the > current JNI interface and C API underpinning is imperfect and while we > have examined other options to wrap access to the underlying C++ > library in Java or Go we may ultimately just write a simple C++ server > to handle concurrent queries -- language selection mostly comes down > to maintainability on our end and C++ is not in our usual mix. > Regardless of how we proceed, is it correct that the thula example's > use of ibis::table.select is the optimal starting point to implement a > read-only server query interface where goals are maintaining > sub-second query response from 20M record table partitions with > concurrency target of at least 100/req/s whether the aggreagte result > set will commonly be 1 row and not more than 1000 if groupby? ;) too > specific a question? In part I am surprised there are no currently > available server implementations already exposing fastbit > functionality -- it offers a much more focused and accessible solution > compared to columnar MPP DB especially for already aggregated data. > > On Wed, Sep 17, 2014 at 11:55 AM, K. John Wu <[email protected] > <mailto:[email protected]>> wrote: > > Hi, Fred, > > Thanks for your interest in FastBit software. If you are planning to > extend FastBit in someway, it would be much better to do it in C++. > The Java API is based on a very old C API that requires data to be on > disk already. > > John > > > On 9/10/14 7:33 AM, Fred Oko wrote: > > Aim is to be able to access aggregates via JNI w/ greater efficiency > > of not having to pull back all the hit values via > get_qualified_ints etc. > > > > I started by just attempting to add support for > > fastbit_build_result_set and passing a select clause with > aggregates. > > But : > > 1) this returns the aggregate for teh wrong column (e.g. if > asking for > > sum(colB) it would return a sum for a different column as was > visible > > by returned aggregate and the debug logging showing access to said > > other column (apparently lexicographically selected) > > 2) as one starts tracing where that went wrong, one realizes this > > method will return a record with those aggreagtes for each hit > as that > > is what the result set would contain -- given that it seem > inefficient > > and based on query.h comment "If any additional functions are needed > > in the select clause, use the function ibis::table::select > instead of > > using this class" I turned to that > > > > From there I took the thula example as a better starting point over > > tcapi and did manage do get the functionality of thula doQuery into > > the capi and access it via JNI. However I want to make certain what > > would be the best way to proceed now that I have validated this was > > feasible. > > 1) do you agree ibis::table.select is optimal for the case of > wanting > > a couple of aggregates over a couple of columns (not necessarily teh > > same as in the where clause) for a set of where clauses against > a table? > > 2) do you have recommendation on how to best expose this -- > adding the > > table facade to FastBitQuery seems cleanest but for now I'm just > > exposing a specialized function > > 3) do you any arguments against a count(*) to the select clause to > > have one complete select response instead of having to mix in a > > separate request for num hits? -- it appears if using table.select I > > won't need to use the query interface and the table mechanisms are > > separate for computing hits > > 4) any concerns with this approach on memory cleanup or > optimizations > > given that these queries will be run within a long lived container? > > > > Thx in advance > > > > > > _______________________________________________ > > FastBit-users mailing list > > [email protected] <mailto:[email protected]> > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > _______________________________________________ > FastBit-users mailing list > [email protected] <mailto:[email protected]> > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
