Hi, I am appreciated to being mentor Stack :) As I know as ASF already participate and you can sign up. [1] last year I was a mentor. I just send an email to private and [email protected]. Would you like to check it ?
[1] https://community.apache.org/gsoc.html#prospective-asf-mentors-read-this 2016-03-22 17:32 GMT-07:00 Enis Söztutar <[email protected]>: >> >> I didn't sign up for GSOC Talat. Not sure anyone else did either. Is it too >> late for us to participate now? >> >> > ASF participates in GSOC, so HBase automatically can participate AFAIK. > > >> I'd mentor you (it'd be easy-peasy -- smile) but I think I've missed the >> mentor signup deadline. >> > > I did not check the deadline, if that is the case, it means this year is > over? > > Your list is pretty good. We can POC with Capt'n proto as well as grpc. > > >> >> >> > BTW I talked with Enis Soztutar. He offered some topics for GSoC. These >> > are: >> > - He mentioned The Data blocks are stored as PREFIX, FAST_DIFF, etc. >> > encoding. But these encodings just can use in HFile context. In RPC >> > and WAL we use KeyValueEncoding for Cell Blocks. He told "You can >> > improve them or using HFile encodings in RPC and WAL" ( He didn't say >> > the issue number But I guessed it is HBASE-12883 Support block >> > encoding based on knowing set of column qualifiers up front) >> > >> >> Sounds like a fine project (Someone was just asking about this offline...) >> >> >> >> > - HBASE-14379 Replication V2 >> > - HBASE-8691 High-Throughput Streaming Scan API >> > - HBASE-3529 Native Solr Indexer for HBase(He just mentioned HBase -> >> > SOLR indexing. I guess it could be this issue.) >> > >> > Could you help me for selecting topics or could you offer another issue ? >> > >> > >> All above are good. >> >> Here's a few others made for another context: >> >> + Become Jepsen distributed systems test tool expert: run it against HBase >> and HDFS. Analyze results. E.g. see >> https://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen >> + Deep dive on hbase Compactions. Own it. Review current options both the >> defaults, experimental, and the stale. Build tooling and surface metrics >> that give better insight on effectiveness of compaction mechanics and >> policies. Develop tunings and alternate, new policies. For further credit, >> develop master-orchestrated compaction algorithm. >> + Reimplement HBase append and increment as write-only with rollup on read >> or using CRDTs ( >> https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) >> + Make the HBase Server async/event driven/SEDA moving it off its current >> thread-per-request basis >> + UI: build out more pages and tabs on the HBase master exposing more of >> our cluster metrics (make the master into a metrics sink). Extra points for >> views, histograms, or dashboards that are both informative AND pretty (D3, >> etc.). A good benchmark would be subsuming the Hannibal tool >> https://github.com/sentric/hannibal >> + Build an example application on HBase for test and illustration: e.g. use >> Jimmy Lin's/The Internet Archive https://github.com/lintool/warcbase to >> load common crawl regular webcrawls https://commoncrawl.org/ or, load >> hbase >> with wikipedia, the flickr dataset, or any dataset that appeals. Extra >> credit for documenting steps involved and filing issues where API is >> awkward or hard to follow. >> + Add actionable statistics to hbase internals that capture vitals about >> the data being served and that we exploit responding to queries; e.g. rough >> sizes of rows, column-families, columns-per-row-per-region, etc. For >> example, if a client has been stepping sequentially through the data, the >> stats would allow us recognize this state so we could switch to a different >> scan type; one that is optimal to a sequential progression. >> + Review and redo our fundamental merge sort, the basis of our read. There >> are a few techniques to try such as a "loser tree merge" ( >> http://sandbox.mc.edu/~bennet/cs402/lec/losedex.html) but ideally we'd >> make >> our merge sort block-based rather than Cell-based. Set yourself up in a rig >> and try different Cell formats to get yourself to a cache-friendly Cell >> format that maximizes instructions per cycle. >> + Our client is heavy-weight and has accumulated lots of logic over time. >> E.g. it is hard to set a single timeout for a request because client is >> layered each with its own running timeouts. At its core is a mostly-done >> async engine. Review, and finish the async work. Rewrite where it makes >> sense after analysis. >> + Our RPC is based on protobuf Service where we plugged in our own RPC >> transport. An exploratory PoC putting HBase up on grpc was done by the grpc >> team. Bring this project home. Extra points if you reveal a Streaming >> Interface between Client and Server. >> + Tiering... if regions are cold, close them so they don't occupy resources >> (close files, purge its data from cache...).... reopen when a request comes >> in.... >> + Dynamic configuration of running HBase >> >> >> St.Ack >> >> >> >> >> > Thanks >> > -- >> > Talat UYARER >> > >> -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 On Tue, Mar 22, 2016 at 5:32 PM, Enis Söztutar <[email protected]> wrote: >> >> I didn't sign up for GSOC Talat. Not sure anyone else did either. Is it too >> late for us to participate now? >> >> > ASF participates in GSOC, so HBase automatically can participate AFAIK. > > >> I'd mentor you (it'd be easy-peasy -- smile) but I think I've missed the >> mentor signup deadline. >> > > I did not check the deadline, if that is the case, it means this year is > over? > > Your list is pretty good. We can POC with Capt'n proto as well as grpc. > > >> >> >> > BTW I talked with Enis Soztutar. He offered some topics for GSoC. These >> > are: >> > - He mentioned The Data blocks are stored as PREFIX, FAST_DIFF, etc. >> > encoding. But these encodings just can use in HFile context. In RPC >> > and WAL we use KeyValueEncoding for Cell Blocks. He told "You can >> > improve them or using HFile encodings in RPC and WAL" ( He didn't say >> > the issue number But I guessed it is HBASE-12883 Support block >> > encoding based on knowing set of column qualifiers up front) >> > >> >> Sounds like a fine project (Someone was just asking about this offline...) >> >> >> >> > - HBASE-14379 Replication V2 >> > - HBASE-8691 High-Throughput Streaming Scan API >> > - HBASE-3529 Native Solr Indexer for HBase(He just mentioned HBase -> >> > SOLR indexing. I guess it could be this issue.) >> > >> > Could you help me for selecting topics or could you offer another issue ? >> > >> > >> All above are good. >> >> Here's a few others made for another context: >> >> + Become Jepsen distributed systems test tool expert: run it against HBase >> and HDFS. Analyze results. E.g. see >> https://www.datastax.com/dev/blog/testing-apache-cassandra-with-jepsen >> + Deep dive on hbase Compactions. Own it. Review current options both the >> defaults, experimental, and the stale. Build tooling and surface metrics >> that give better insight on effectiveness of compaction mechanics and >> policies. Develop tunings and alternate, new policies. For further credit, >> develop master-orchestrated compaction algorithm. >> + Reimplement HBase append and increment as write-only with rollup on read >> or using CRDTs ( >> https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type) >> + Make the HBase Server async/event driven/SEDA moving it off its current >> thread-per-request basis >> + UI: build out more pages and tabs on the HBase master exposing more of >> our cluster metrics (make the master into a metrics sink). Extra points for >> views, histograms, or dashboards that are both informative AND pretty (D3, >> etc.). A good benchmark would be subsuming the Hannibal tool >> https://github.com/sentric/hannibal >> + Build an example application on HBase for test and illustration: e.g. use >> Jimmy Lin's/The Internet Archive https://github.com/lintool/warcbase to >> load common crawl regular webcrawls https://commoncrawl.org/ or, load >> hbase >> with wikipedia, the flickr dataset, or any dataset that appeals. Extra >> credit for documenting steps involved and filing issues where API is >> awkward or hard to follow. >> + Add actionable statistics to hbase internals that capture vitals about >> the data being served and that we exploit responding to queries; e.g. rough >> sizes of rows, column-families, columns-per-row-per-region, etc. For >> example, if a client has been stepping sequentially through the data, the >> stats would allow us recognize this state so we could switch to a different >> scan type; one that is optimal to a sequential progression. >> + Review and redo our fundamental merge sort, the basis of our read. There >> are a few techniques to try such as a "loser tree merge" ( >> http://sandbox.mc.edu/~bennet/cs402/lec/losedex.html) but ideally we'd >> make >> our merge sort block-based rather than Cell-based. Set yourself up in a rig >> and try different Cell formats to get yourself to a cache-friendly Cell >> format that maximizes instructions per cycle. >> + Our client is heavy-weight and has accumulated lots of logic over time. >> E.g. it is hard to set a single timeout for a request because client is >> layered each with its own running timeouts. At its core is a mostly-done >> async engine. Review, and finish the async work. Rewrite where it makes >> sense after analysis. >> + Our RPC is based on protobuf Service where we plugged in our own RPC >> transport. An exploratory PoC putting HBase up on grpc was done by the grpc >> team. Bring this project home. Extra points if you reveal a Streaming >> Interface between Client and Server. >> + Tiering... if regions are cold, close them so they don't occupy resources >> (close files, purge its data from cache...).... reopen when a request comes >> in.... >> + Dynamic configuration of running HBase >> >> >> St.Ack >> >> >> >> >> > Thanks >> > -- >> > Talat UYARER >> > >> -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
