On Tue, Dec 3, 2013 at 3:07 PM, Enis Söztutar <enis....@gmail.com> wrote:
> On Tue, Dec 3, 2013 at 2:03 PM, Jonathan Hsieh <j...@cloudera.com> wrote: > > > On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar <enis....@gmail.com> > wrote: > > > > > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh <j...@cloudera.com> > > wrote: > > > > > > > > Deveraj: > > > > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality > > (and > > > > hence HDFS short > > > > > circuit) of reads if you were to couple it with the favored nodes. > > The > > > > cost is of course more WAL > > > > > files... In the current situation (no WALpr) it would create quite > > some > > > > traffic cross machine, no? > > > > > > > > I think we all agree that wal per region isn't efficient on today's > > > > spinning hard drive world where we are limited to a relatively low > > budget > > > > or seeks (though may be better in the future with SSD's). > > > > > > > > > > WALpr makes sense in fully SSD world and if hdfs had journaling for > > writes. > > > I don't think anybody > > > is working on this yet. > > > > > > what do you mean by journaling for writes? do you mean where sync > > operations update length at the nn on every call? > > > > I think hdfs guys were using "super sync" for referring to that. I was > referring to > journaling file system ( > http://en.wikipedia.org/wiki/Journaling_file_system) > where the writes to > multiple files are persisted to a journal disk so that you do not pay the > constant seeks for writing to > a lot of files (for regions wals) in parallel. > > > Wait, we have a system that provides the ability to write data for a bunch of buckets to a particular disk before rewriting them to others in a split out read optimized from.. Isn't this what HBase and its HLog basically provides? :) Joking aside, can you give a quick example of the semantics it would have so I can grok what you are talking about? Jon -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // j...@cloudera.com