Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

Stack Thu, 11 Mar 2010 22:41:35 -0800

On Thu, Mar 11, 2010 at 10:18 PM, Yi Liang <white...@gmail.com> wrote:
> Hi St.Ack,
>
> Can hbase-1537 applied to 0.20.3? It should be very useful, but the patch
> can't compile Scan.java and HRegion.java for me.
>


I took a quick look.  It looks like it wouldn't take too much
massaging getting the patch to apply to trunk.  What kinda errors are
you seeing?  If you get it to work, it'd be good to backport.

St.Ack


> Thanks,
> Yi
> On Tue, Mar 9, 2010 at 2:26 PM, Stack <st...@duboce.net> wrote:
>
>> On Mon, Mar 8, 2010 at 6:58 PM, William Kang <weliam.cl...@gmail.com>
>> wrote:
>> > Hi,
>> > Can you give me some more details about how the information in a row can
>> be
>> > fetched? I understand that a file like 1.5 G may have multiple HFiles in
>> a
>> > region server. If the client want to access a column label value in  that
>> > row, what is going to happen?
>>
>> Only that cell is fetched if you specify an explicity column name
>> (column family + qualifier).
>>
>> After HBase found the region store this row,
>> > it goes to region .meta and find the index of the HFile that store the
>> > column family. And the HFile has the offset of keyvalue pairs. Then HBase
>> > can go to the keyvalue pair and get the value for a certain column label.
>> >
>>
>> Yes.
>>
>>
>> > Why the whole row needs to be read in memory?
>> >
>>
>> If you ask for the whole row, it will try to load it all to deliver it
>> all to you.  There is no "streaming" api per se.  Rather a Result
>> object is passed from server to client which has in it all in a row
>> keyed by column name.
>>
>> That said, if you want the whole row and you are scanning as opposed
>> to getting, TRUNK has hbase-1537 applied which allows for intra-row
>> scanning -- you call setBatch to set maximum returned within a row and
>> the 0.20 branch has HBASE-1996, which allows you set maximum size
>> returned on a next invocation (in both cases, if the row is not
>> exhausted, the next 'next' invocation will return more out of the
>> current row, and so on, until the row is exhausted).
>>
>> > If HBase does not read the whole row at once, what caused its
>> inefficiency?
>>
>> I think Ryan is just allowing that the above means of scanning parts
>> of rows may have bugs that we've not yet squashed.
>>
>> St.Ack
>>
>>
>> > Thanks.
>> >
>> >
>> > William
>> >
>> > On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> At this time, truly massive massive rows such as the one you described
>> >> may behave non-optimally in hbase. While in previous versions of
>> >> HBase, reading an entire row required you to be able to actually read
>> >> and send the entire row in one go, there is a new API that allows you
>> >> to get effectively stream rows.  There are still some read paths that
>> >> may read more data than necessary, so your performance milage may
>> >> vary.
>> >>
>> >>
>> >>
>> >> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor
>> >> <suhail...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > This might prove to be a blatantly obvious questions but wouldn't it
>> make
>> >> > sense to store large files directly in HDFS and keep the metadata
>> about
>> >> the
>> >> > file in HBase? One could for instance serialize set the details of the
>> >> hdfs
>> >> > file in a java object and store that in hbase. This object could
>> export
>> >> the
>> >> > reading of the hdfs file for instance so that one is left with clean
>> >> code.
>> >> > Anything wrong in implementing things this way?
>> >> >
>> >> > Cheers
>> >> > su./hail
>> >> >
>> >> > On 07/03/2010 09:21, tsuna wrote:
>> >> >>
>> >> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang
>> >> >> <steven.zhuang.1...@gmail.com>  wrote:
>> >> >>
>> >> >>>
>> >> >>>          I have a table which may contain super big rows, e.g. with
>> >> >>> millions of cells in one row, 1.5GB in size.
>> >> >>>
>> >> >>>          now I have problem at emitting data into the table,
>> probably
>> >> >>> because of these super big rows are too large for my
>> regionserver(with
>> >> >>> only
>> >> >>> 1GB heap)
>> >> >>>
>> >> >>
>> >> >> A row can't be split and whatever you do that needs that row (like
>> >> >> reading it) requires that HBase loads the entire row in memory.  If
>> >> >> the row is 1.5GB and your regionserver has only 1G of memory, it
>> won't
>> >> >> be able to use that row.
>> >> >>
>> >> >> I'm not 100% sure about that because I'm still a HBase n00b too, but
>> >> >> that's my understanding.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

Reply via email to