Re: [FastBit-users] FastBit-users Digest, Vol 59, Issue 13

K. John Wu Tue, 24 Jul 2012 21:55:41 -0700

Reading data one row at a time is slow, therefore a strict
implementation of cursor is slow.  We don't actually use this option.
 We try to pick a faster option when reading the raw data values - in
most cases, when a column is needed, all values (of the column) in a
data partition is read into memory (unless an alternative option is
faster).  In most cases, all selected columns from a data partition
get read into memory at the same time.  This means FastBit will use
more memory than you expect.


FastBit does not support explicit order through command line
interface.  You can call ibis::table::orderby from a C++ program.  In
either case, all intermediate data has to fit in memory in the current
implementation.  This should be a faster option but requires a bit
more memory.

Regarding your question 3, there are some logic in FastBit to decide
exactly when to use the index and when to use the raw data values.
Your thinking is basically right, however, the decisions are not as
clear cut as you described because the cost of using an index and the
cost of using the raw data file are not easily measured.

John


On 7/24/12 8:24 PM, Lidawei (Davey) wrote:
> Hi John,
> Thanks for your reply.
> 
> Some further question:
> 1. As your explain, only necessary data will be load into memory. Does it 
> include fields in select clause? In my idea, query result will be located by 
> row number, while output fields will be generated on the fly when go through 
> the cursor? So the fields in select clause do NOT exhaust memory. Is it 
> correct?
> 
> 2. For multi partition query, you have explain the "group by" clearly. How 
> about "order by"? I think, it also influent the memory usage manner. In order 
> to realize "order by", all rows satisfied the query condition must be loaded 
> into memory for sort. Is it correct?
> 
> 
> 3. For a query condition like "A>5 and B<6". Does you mean column A and 
> column B(or their index file) may be load into memory, but:
>   1) If A>5 already generate a small amount result of rows, then the evaluate 
> of B<6 will base on the original data rather than index file?
>   2) If A>5 generate a large amount result, then B<6 will be evaluate base on 
> index file(or original data if index file is bigger)
>   3) If a index file required it maybe load partly by range for example rows 
> satisfied "B<6" maybe stay together in a range index file.
> are these correct?
> 
> 
> Regards,
> Davey
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Wednesday, July 18, 2012 10:15 PM
> To: Lidawei (Davey)
> Cc: FastBit Users
> Subject: Re: FastBit-users Digest, Vol 59, Issue 13
> 
> Hi, Davey,
> 
> Sorry about the silence.  I have been working on a few other things
> and was unable to spend much time on fixing the problems with BLOB.
> Let me say that BLOB is not meant to participate in any kind of
> operations.  We are only intending to support reading and writing of
> BLOBs.
> 
> When answering a query, the necessary data is read into memory.
> Depending on the specific query, what is necessary would be different.
>  For string valued columns, the internally representation tries to
> stay with integers as much as possible.  One possible complication
> here is in processing group-bys.  For example, if a table is broken
> into two partitions, and the group-by operation involves strings, then
> when the two partitions have the same dictionary for the strings, the
> group-by operations can proceed in integer representation.  On the
> other hand, if the two dictionaries for the columns with the same name
> are different, then the group-by operation has to be carried out with
> the strings explicitly.
> 
> At this time, the group-by operation is carried out in memory.  In
> other word, the data needed for the group-by operation including the
> dictionaries and other supporting information must fit in memory.
> This includes all intermediate values, not just the final results from
> the group-by operation.
> 
> FastBit software works with data partition at a time.  When a query
> involves 5 columns, then it is possible all 5 columns will have to be
> read into memory - exactly which columns are needed at the same time
> depends on the nature of the query.  For example, if you have an
> arithmetic expression involve these 5 columns, then it would be
> necessary to have all of them in memory in order to evaluate the
> arithmetic expression.  If the 5 columns are involved in 5 separate
> simple range conditions (such as "A > 5 and B < 6 ...") then most
> likely only the bitmap index associated with each column is partially
> read into memory - even in this case it is possible for FastBit to
> want to read the raw data if the raw data is relatively small compared
> to the corresponding index.
> 
> Hop this helps..
> 
> John
> 
> 
> 
> 
> On 7/18/12 5:06 AM, Lidawei (Davey) wrote:
>> Hi John,
>>
>> Did I bother you? I have not receive your reply for a long time.
>>
>> I found BLOB support in following email list. Does BLOB support bitwise 
>> operator which I can simulate IN() function, or will you realize it?
>>
>> Because my use case is in an important service, I need make everything under 
>> control. So I want to known the memory usage of fastbit. Detail list 
>> following:
>>
>> 1. Does fastbit read all rows which satisfied the condition?
>>
>> 2. Do only columns in select clause will hold in memory?
>>
>> 3. Does only a single part in memory at a certain time? If parts are 
>> processed one by one. How you deal with order by?
>>
>> 4. For String column, is only position hold in memory? If so, is string 
>> column query speed rely on the IO workload of File read.
>>
>>
>>
>> Best Regards,
>> Davey
>>
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of 
>> [email protected]
>> Sent: Tuesday, July 17, 2012 3:01 AM
>> To: [email protected]
>> Subject: FastBit-users Digest, Vol 59, Issue 13
>>
>> Send FastBit-users mailing list submissions to
>>      [email protected]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>      https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> or, via email, send a message with subject or body 'help' to
>>      [email protected]
>>
>> You can reach the person managing the list at
>>      [email protected]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of FastBit-users digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Inserting binary (ibis::BLOB) data using
>>       tablex::appendRow(const ibis::table::row&)
>>       (Gerrit Hendrikus van Doorn)
>>    2. Re: Inserting binary (ibis::BLOB) data using
>>       tablex::appendRow(const ibis::table::row&) (K. John Wu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 15 Jul 2012 15:52:42 -0700
>> From: Gerrit Hendrikus van Doorn <[email protected]>
>> Subject: [FastBit-users] Inserting binary (ibis::BLOB) data using
>>      tablex::appendRow(const ibis::table::row&)
>> To: [email protected]
>> Message-ID:
>>      <CAL_hiPpjhEHjhkS21fUWv5eqk-=qvpwcjp64jybvunbn48i...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> I've been using tablex::appendRow(const ibis::table::row&) to insert data
>> to a table.
>> I wanted to add some binary data using the ibis::BLOB type. When I tried
>> this I got a segmentation fault. Is the BLOB type supported by fastbit? I
>> did a quick scan through tafel.cpp and the appendRow method doesn't seem to
>> be doing anything with the ibis::BLOB type. Is there any other way to
>> insert binary data?
>>
>> Thanks,
>> Gerrit
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: 
>> http://hpcrdm.lbl.gov/pipermail/fastbit-users/attachments/20120715/927b0f80/attachment-0001.htm
>>  
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 16 Jul 2012 07:43:00 -0700
>> From: "K. John Wu" <[email protected]>
>> Subject: Re: [FastBit-users] Inserting binary (ibis::BLOB) data using
>>      tablex::appendRow(const ibis::table::row&)
>> To: FastBit Users <[email protected]>
>> Cc: Gerrit Hendrikus van Doorn <[email protected]>
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Dear Gerrit,
>>
>> We have been working on straightening out the code in dealing with
>> BLOBs.  Unfortunately, it might still going to take a couple of weeks.
>>
>> John
>>
>>
>> On 7/15/12 3:52 PM, Gerrit Hendrikus van Doorn wrote:
>>> Hi,
>>>
>>> I've been using tablex::appendRow(const ibis::table::row&) to insert
>>> data to a table. 
>>> I wanted to add some binary data using the ibis::BLOB type. When I
>>> tried this I got a segmentation fault. Is the BLOB type supported by
>>> fastbit? I did a quick scan through tafel.cpp and the appendRow method
>>> doesn't seem to be doing anything with the ibis::BLOB type. Is there
>>> any other way to insert binary data?
>>>
>>> Thanks,
>>> Gerrit
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>
>>
>> End of FastBit-users Digest, Vol 59, Issue 13
>> *********************************************
>>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] FastBit-users Digest, Vol 59, Issue 13

Reply via email to