Can someone explain the notes below ? bq. - hbase deman was with hadoop
bq. - scan kind of halfass, for hive. bq. - finace sector Thanks On Tue, Mar 27, 2012 at 11:49 AM, Jonathan Hsieh <[email protected]> wrote: > People: > Cloudera: Todd, Dave W, Shaneel M, Jonathan H, Himanshu, Greg C, Matteo B > (remote) > FB: Nicolas, Amir > > druba - ubase/hstore - transactin processing, through hive-hbase > integration. > > hbase team with hdfs team. > - hbase deman was with hadoop > > NY - carve out hunk of HBase to work on. > > Long term: > real time hive, deep integration. > - beyond just translate to MR job. > - Use in megastore. > - scan kind of halfass, for hive. > - previously point query optimization. > - analystics too long to scan table. > - doing on demand compression. > > Edgecases > - finace sector > - gpu cases. > > Uptime and availaiblity. > - chaos monkey > - poll all regions > > Hbase 0.89 - fast region failover. > - down time down to.. > > Take down rack - test cases > > putting data node selection in master. > - on per region basis, hash chain - so assigned secondary and tertiary. > > What is Cloudera focus? > > HDFS HA story > - Talking to HW -- bookies in HDFS ("public story, but ...") > - logs in hdfs. > - Standby node. > - zk flag - halfass solution. "double fails" not in scope. > - todd: 3 journal daemons, quorom for edits, pluggable journal manager > interface. > > Facebook - new data infrastructure > - focus on quality, reliability, visibility. > - upping rolling restart to improve monitoring > > HBase - stable depends on use case > - pushing out use cases > - ODS, (soon) > - Puma analytics > - ubase - researchy > - site integrity > - hash out cluster (generic kv store, persistent memcache ), multi-tenant > cluster, "photo stuff" (haystack) > - wormhole - backup replication - on hashout cluster, master slave, cross > DC replication. > > Replication - talk to Madu > > HDFS hard links - on github. > - at data node layer. > - hari m - HW - hard links also. (claims working prototype) > > Kannan - > > pubsub, > 2ndary index. > native c++ thrift client. > open sourcing folly (c++ stl) > > - distrbute log splitting task manager > - ordering for bulk master operations, eliminate class of problems. > > Online schema changes > - high friction to change > - check column descriptor, then table, then configuraiton. > - tune new features for column family. > > FB doesn't care about access control. > - auditing - multi tenancy case. > - specific app servers that will access - perms > - FB will do security at a higher level > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [email protected] >
