Hi, Fred,

Yes, ibis::table::select should be the best option for your use case.
 With 20M rows, most of the queries should complete in a second or
less.  Let us know if you encounter any performance problems.

John


On 9/18/14 6:56 AM, Fred Oko wrote:
> We currently use fastbit in read-only mode from our server interface
> (currently in Java) so the aspect of "requires data to be on disk
> already" is our norm as we ETL from our raw data into fastbit index
> before querying (if I am correct in assuming that is all that you
> meant). So for now the C API over JNI provides most of what we need
> except for the optimal means of aggregate queries and some occasional
> concurrency issues not yet fully debugged. However, I recognize the
> current JNI interface and C API underpinning is imperfect and while we
> have examined other options to wrap access to the underlying C++
> library in Java or Go we may ultimately just write a simple C++ server
> to handle concurrent queries -- language selection mostly comes down
> to maintainability on our end and C++ is not in our usual mix.
> Regardless of how we proceed, is it correct that the thula example's
> use of ibis::table.select is the optimal starting point to implement a
> read-only server query interface where goals are maintaining
> sub-second query response from 20M record table partitions with
> concurrency target of at least 100/req/s whether the aggreagte result
> set will commonly be 1 row and not more than 1000 if groupby? ;) too
> specific a question? In part I am surprised there are no currently
> available server implementations already exposing fastbit
> functionality -- it offers a much more focused and accessible solution
> compared to columnar MPP DB especially for already aggregated data.
> 
> On Wed, Sep 17, 2014 at 11:55 AM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Fred,
> 
>     Thanks for your interest in FastBit software.  If you are planning to
>     extend FastBit in someway, it would be much better to do it in C++.
>     The Java API is based on a very old C API that requires data to be on
>     disk already.
> 
>     John
> 
> 
>     On 9/10/14 7:33 AM, Fred Oko wrote:
>     > Aim is to be able to access aggregates via JNI w/ greater efficiency
>     > of not having to pull back all the hit values via
>     get_qualified_ints etc.
>     >
>     > I started by just attempting to add support for
>     > fastbit_build_result_set and passing a select clause with
>     aggregates.
>     > But :
>     > 1) this returns the aggregate for teh wrong column (e.g. if
>     asking for
>     > sum(colB) it would return a sum for a different column as was
>     visible
>     > by returned aggregate and the debug logging showing access to said
>     > other column (apparently lexicographically selected)
>     > 2) as one starts tracing where that went wrong, one realizes this
>     > method will return a record with those aggreagtes for each hit
>     as that
>     > is what the result set would contain -- given that it seem
>     inefficient
>     > and based on query.h comment "If any additional functions are needed
>     > in the select clause, use the function ibis::table::select
>     instead of
>     > using this class" I turned to that
>     >
>     > From there I took the thula example as a better starting point over
>     > tcapi and did manage do get the functionality of thula doQuery into
>     > the capi and access it via JNI. However I want to make certain what
>     > would be the best way to proceed now that I have validated this was
>     > feasible.
>     > 1) do you agree ibis::table.select is optimal for the case of
>     wanting
>     > a couple of aggregates over a couple of columns (not necessarily teh
>     > same as in the where clause) for a set of where clauses against
>     a table?
>     > 2) do you have recommendation on how to best expose this --
>     adding the
>     > table facade to FastBitQuery seems cleanest but for now I'm just
>     > exposing a specialized function
>     > 3) do you any arguments against a count(*) to the select clause to
>     > have one complete select response instead of having to mix in a
>     > separate request for num hits? -- it appears if using table.select I
>     > won't need to use the query interface and the table mechanisms are
>     > separate for computing hits
>     > 4) any concerns with this approach on memory cleanup or
>     optimizations
>     > given that these queries will be run within a long lived container?
>     >
>     > Thx in advance
>     >
>     >
>     > _______________________________________________
>     > FastBit-users mailing list
>     > [email protected] <mailto:[email protected]>
>     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >
>     _______________________________________________
>     FastBit-users mailing list
>     [email protected] <mailto:[email protected]>
>     https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to