People:
Cloudera: Todd, Dave W, Shaneel M, Jonathan H, Himanshu, Greg C, Matteo B
(remote)
FB: Nicolas, Amir
druba - ubase/hstore - transactin processing, through hive-hbase
integration.
hbase team with hdfs team.
- hbase deman was with hadoop
NY - carve out hunk of HBase to work on.
Long term:
real time hive, deep integration.
- beyond just translate to MR job.
- Use in megastore.
- scan kind of halfass, for hive.
- previously point query optimization.
- analystics too long to scan table.
- doing on demand compression.
Edgecases
- finace sector
- gpu cases.
Uptime and availaiblity.
- chaos monkey
- poll all regions
Hbase 0.89 - fast region failover.
- down time down to..
Take down rack - test cases
putting data node selection in master.
- on per region basis, hash chain - so assigned secondary and tertiary.
What is Cloudera focus?
HDFS HA story
- Talking to HW -- bookies in HDFS ("public story, but ...")
- logs in hdfs.
- Standby node.
- zk flag - halfass solution. "double fails" not in scope.
- todd: 3 journal daemons, quorom for edits, pluggable journal manager
interface.
Facebook - new data infrastructure
- focus on quality, reliability, visibility.
- upping rolling restart to improve monitoring
HBase - stable depends on use case
- pushing out use cases
- ODS, (soon)
- Puma analytics
- ubase - researchy
- site integrity
- hash out cluster (generic kv store, persistent memcache ), multi-tenant
cluster, "photo stuff" (haystack)
- wormhole - backup replication - on hashout cluster, master slave, cross
DC replication.
Replication - talk to Madu
HDFS hard links - on github.
- at data node layer.
- hari m - HW - hard links also. (claims working prototype)
Kannan -
pubsub,
2ndary index.
native c++ thrift client.
open sourcing folly (c++ stl)
- distrbute log splitting task manager
- ordering for bulk master operations, eliminate class of problems.
Online schema changes
- high friction to change
- check column descriptor, then table, then configuraiton.
- tune new features for column family.
FB doesn't care about access control.
- auditing - multi tenancy case.
- specific app servers that will access - perms
- FB will do security at a higher level
--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [email protected]