Re: Notes from dev meetup in Shenzhen, August 5th, 2017

Jerry He Mon, 07 Aug 2017 15:12:49 -0700

Looks like you folks had a good time there.  I wish I could've made it.
Good write-up too.


Thanks.

Jerry

On Mon, Aug 7, 2017 at 10:38 AM, ramkrishna vasudevan
<ramkrishna.s.vasude...@gmail.com> wrote:
> Thanks for the write up Stack. I could not make it to Shenzhen. Nice to
> know the conference and meet up went great.
>
> Regards
> Ram
>
> On Mon, Aug 7, 2017 at 9:36 PM, Stack <st...@duboce.net> wrote:
>
>> At fancy Huawei headquarters, 10:00-12:00AM or so (with nice coffee and
>> fancy little cake squares provided about half way through the session).
>>
>> For list of attendees, see picture at end of this email.
>>
>> Discussion was mostly in Chinese with about 25% in English plus some
>> gracious sideline translation so the below is patchy. Hopefully you get the
>> gist.
>>
>> For client-side scanner going against hfiles directly; is there a means of
>> being able to pass the permissions from hbase to hdfs?
>>
>> Issues w/ the hbase 99th percentile were brought up. "DynamoDB can do
>> 10ms". How to do better?
>>
>> SSD is not enough.
>>
>> GC messes us up.
>>
>> Will the Distributed Log Replay come back to help improve MTTR? We could
>> redo on new ProcedureV2 basis. ZK timeout is the biggest issue. Do as we
>> used to and just rely on the regionserver heartbeating...
>>
>> Read replica helps w/ MTTR.
>>
>> Ratis incubator project to do a quorum based hbase?
>>
>> Digression on licensing issues around fb wangle and folly.
>>
>> Redo of hbase but quorum based would be another project altogether.
>>
>> Decided to go around the table to talk about concerns and what people are
>> working on.
>>
>> Jieshan wondered what could be done to improve OLAP over hbase.
>>
>> Client side scanner was brought up again as means of skipping RS overhead
>> and doing better OLAP.
>>
>> Have HBase compact to parquet files. Query parquet and hbase.
>>
>> At Huawei, they are using 1.0 hbase. Most problems are assignment. They
>> have .5M regions. RIT is a killer. Double assignment issues. And RIT. They
>> run their own services. Suggested they upgrade to get fixes at least. Then
>> 2.0.
>>
>> Will HBase federate like HDFS? Can Master handle load at large scale? It
>> needs to do federation too?
>>
>> Anyone using Bulk loaded replication? (Yes, it just works so no one talks
>> about it...)
>>
>> Request that fixes be backported to all active branches, not just most
>> current.
>>
>> Andrew was good at backporting... not all RMs are.
>>
>> Too many branches. What should we do?
>>
>> Proliferation of branches makes for too much work.
>>
>> Need to cleanup bugs in 1.3. Make it stable release now.
>>
>> Lets do more active EOL'ing of branches. 1.1?.
>>
>> Hubert asked if we can have clusters where RS are differently capable?
>> i.e. several generations of HW all running in the same cluster.
>>
>> What if fat server goes down.
>>
>> Balancer could take of it all. RS Capacity. Balancer can take it into
>> account.
>> Regionserver labels like YARN labels. Characteristics.
>>
>> Or run it all in docker when heterogeneous cluster. The K8 talk from day
>> before was mentioned; we should all look at being able to deploy in k8 and
>> docker.
>>
>> Lets put out kubernetes blog...(Doing).
>>
>> Alibaba looking at HBase as native YARN app.
>>
>> i/o is hard even when containers.
>>
>> Use autoscaler of K8 when heavy user.
>>
>> Limit i/o use w/ CP. Throttle.
>>
>> Spark and client-side scanner came up again.
>>
>> Snapshot input format in spark.
>>
>> HBase federation came up again. jd.com talking of 3k to 4k nodes in a
>> cluster. Millions of regions. Region assignment is messing them up.
>>
>> Maybe federation is good idea? Argument that it is too much operational
>> conplexity. Can we fix master load w/ splittable meta, etc?
>>
>> Was brought up that even w/ 100s of RS there is scale issue, nvm thousands.
>>
>> Alibaba talked about disaster recovery. Described issue where HDFS has
>> fencing problem during an upgrade. There was no active NN. All RS went down.
>> ZK is another POF. If ZK is not available. Operators were being asked how
>> much longer the cluster was going to be down but they could not answer the
>> question. No indicators from HBase on how much longer it will be down or
>> how many WALs its processed and how many more to go. Operator unable to
>> tell his org how long it would be before it all came back on line. Should
>> say how many regions are online and how many more to do.
>>
>> Alibaba use SQL to lower cost. HBase API is low-level. Row-key
>> construction is tricky. New users make common mistakes. If you don't do
>> schema right, high-performance is difficult.
>>
>> Alibaba are using a subset of Phoenix... simple sql only; throws
>> exceptions if user tries to do joins, etc.., anything but basic ops.
>>
>> HareQL is using hive for meta store.  Don't have data typing in hbase.
>>
>> HareQL could perhaps contribute some piece... or a module in hbase to
>> sql... From phoenix?
>>
>> Secondary index.
>>
>> Client is complicated in phoenix. Was suggested thin client just does
>> parse... and then offload to server for optimization and execution.
>>
>> Then secondary index. Need transaction engine. Consistency of secondary
>> index.
>>
>> We adjourned.
>>
>> Your dodgy secretary,
>> St.Ack
>> P.S. Please add to this base set of notes if I missed anything.
>>
>>
>>
>>

Re: Notes from dev meetup in Shenzhen, August 5th, 2017

Reply via email to