No lines of hbase were changed to run on Mapr. Mapr implements the hdfs API
and uses jni to get local data. If hdfs wanted to it could use more
sophisticated methods to get data rapidly from local disk to a client's
memory space...as Mapr does.
On Jul 9, 2011 6:05 PM, "Doug Meil" <doug.m...@explorysmedical.com> wrote:
>
> re: "If a variant of hdfs-347 was committed,"
>
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements. Anything else is coding
> around the bigger I/O issue.
>
>
>
> On 7/9/11 6:13 PM, "Ryan Rawson" <ryano...@gmail.com> wrote:
>
>>I think my general point is we could hack up the hbase source, add
>>refcounting, circumvent the gc, etc or we could demand more from the dfs.
>>
>>If a variant of hdfs-347 was committed, reads could come from the Linux
>>buffer cache and life would be good.
>>
>>The choice isn't fast hbase vs slow hbase, there are elements of bugs
>>there
>>as well.
>>On Jul 9, 2011 12:25 PM, "M. C. Srivas" <mcsri...@gmail.com> wrote:
>>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <
>>jason.rutherg...@gmail.com
>>>> wrote:
>>>
>>>> There are couple of things here, one is direct byte buffers to put the
>>>> blocks outside of heap, the other is MMap'ing the blocks directly from
>>>> the underlying HDFS file.
>>>
>>>
>>>> I think they both make sense. And I'm not sure MapR's solution will
>>>> be that much better if the latter is implemented in HBase.
>>>>
>>>
>>> There're some major issues with mmap'ing the local hdfs file (the
>>>"block")
>>> directly:
>>> (a) no checksums to detect data corruption from bad disks
>>> (b) when a disk does fail, the dfs could start reading from an alternate
>>> replica ... but that option is lost when mmap'ing and the RS will crash
>>> immediately
>>> (c) security is completely lost, but that is minor given hbase's current
>>> status
>>>
>>> For those hbase deployments that don't care about the absence of the (a)
>>and
>>> (b), especially (b), its definitely a viable option that gives good
>>>perf.
>>>
>>> At MapR, we did consider similar direct-access capability and rejected
>>>it
>>> due to the above concerns.
>>>
>>>
>>>
>>>>
>>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>>>> > The overhead in a byte buffer is the extra integers to keep track of
>>the
>>>> > mark, position, limit.
>>>> >
>>>> > I am not sure that putting the block cache in to heap is the way to
>>>>go.
>>>> > Getting faster local dfs reads is important, and if you run hbase on
>>top
>>>> of
>>>> > Mapr, these things are taken care of for you.
>>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen"
>>>><jason.rutherg...@gmail.com>
>>>> > wrote:
>>>> >> Also, it's for a good cause, moving the blocks out of main heap
>>>>using
>>>> >> direct byte buffers or some other more native-like facility (if
>>>>DBB's
>>>> >> don't work).
>>>> >>
>>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <ryano...@gmail.com>
>>wrote:
>>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the
>>>>API
>>>> >>> is...annoying.
>>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <
>>jason.rutherg...@gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>> Is there an open issue for this? How hard will this be? :)
>>>> >>>
>>>> >
>>>>
>

Reply via email to