Folks, I am still wondering why would an LSH (Locality Sensitive Hashing) based partitioning scheme provide better scalability than a normal cryptographic hash scheme. Is there a chance that LSH will offer better performance than a normal one?
Best, Ketan On Mon, Mar 1, 2010 at 9:15 PM, Eli Collins <e...@cloudera.com> wrote: > On Mon, Mar 1, 2010 at 5:42 PM, Ketan Dixit <ketan.di...@gmail.com> wrote: >> Hello, >> Thank you Konstantin and Allen for your reply. The information >> provided really helped to improve my understanding. >> However I still have few questions. >> How Symlinks/ soft links are used to solve the probem of partitioning. >> (Where do the symlinks point to? All the mapping is >> stored in memory but symlinks point to file objects? This is little >> confusing to me) >> Can you please provide insight into this? > > The idea is to use symlinks to present a single namespace to clients > that is backed by multiple file systems (hdfs or other supported > hadoop file systems). Eg a "root" HDFS file system could contain links > to other file systems, eg /dir1 could point to S3, /dir2 could point > to a local file system, /dir3 could point to another HDFS file system, > etc. Clients always contact the "root" HDFS file system but are > transparently redirected to other file systems by symlinks. This way a > single namespace is partitioned across multiple file systems, but the > client only needs to know about the root file system. This > partitioning is static (you have to establish the symlinks), though > you can grow on the fly by adding file systems and links that point to > them. > > Thanks, > Eli >