Re: Multi get/put

Marcus Herou Sun, 10 Aug 2008 19:57:46 -0700

Hi Jun and thanks!

We are in the middle of the process of setting up our "SAN" servers and I
will run both Bonnie++ and IOZone tests on them as soon as we are done and
before we put them into production. Another good thing about GlusterFS is
that the community around it is super and everyone is tuned in at
performance: E.g.
http://www.gluster.org/docs/index.php/Guide_to_Optimizing_GlusterFS


I have used GlusterFS as replacement for NFS for our webapps and it has not
failed me yet :) I was so impressed that I decided to use GlusterFS without
even considering Lustre, KosmosFS etc which I have had real trouble with in
the past.

Perhaps not fully satisfying answer but look here:
http://www.gluster.org/docs/index.php/GlusterFS_1.3.1_-_64_Bricks_Aggregated_I/O_Benchmark

The only tricky part in the process of setting up GlusterFS is to get the
patched FUSE running on Ubuntu Hardy (not needed). The patched FUSE gives
additional performance so you would want to get it installed but GlusterFS
works just fine with the FUSE that comes with the kernel.

The installation of GlusterFS takes less than 10 mins and configuration
perhaps 30-60 mins the first time.

I have been working with big NetApp solutions, PolyServe etc and everyone I
speak to says the same thing: GlusterFS rock!
Our hardware supplier Southpole.se are as well specialists in distributed
computing and they have many customers in the technical Universities around
Sweden which are moving from Lustre "the industry standard" to GlusterFS.

Kindly

/Marcus



On Sun, Aug 10, 2008 at 6:44 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Marcus,
>
> I found your discussion on distributed file systems very interesting. Could
> you shed light on how those file systems compare (HDFS, KFS, Lustre,
> GlusterFS, etc)? Do they all support locality as HDFS does? How easy is the
> setup? What about the read/write performance (both sequential and random
> I/O)? Thanks,
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>
> [EMAIL PROTECTED]
> (408)927-1886 (phone)
> (408)927-3215 (fax)
>
>
> "Marcus Herou" <[EMAIL PROTECTED]> wrote on 08/09/2008 04:40:46
> AM:
>
> > Hi.
> >
> > Cool! This is a much lower level and probably better approach than ours.
> We
> > have now a functional index which however only have support for primitve
> > types but not free text indexing. It can store dups of data in the index
> for
> > fast retrieval. It is mostly used as a test of howto scale indexing
> > alongside with HBase. In the end we will probably stick with Lucene.
> >
> > We will probably in the end as well subclass HRegion, HTable etc but for
> now
> > we have a system which rather uses the existing framework.
> >
> > I understand that you would like to use HDFS for storing stuff... But
> have
> > you tried GlusterFS ?
> >
> > It is so simple and really works as a normal POSIX system. We will store
> our
> > Solr based index failes in GlusterFS. Actually I think we will use
> GlusterFS
> > as storing mechanism for the HDFS as well :) Stupid but we have some
> highly
> > potential storage machines which are must faster than a bunch of local
> > machines.
> >
> > The community should really spend some time in looking at the first of my
> > knowledge clustered file system which will lower storage costs making SAN
> > commodity. Yes we have Lustre, yes we have KosmosFS but have you ever
> tried
> > to install Lustre ? Puh... Enough about GlusterFS this is a HBase mailing
> > list :)
> >
> > Kindly
> >
> > //Marcus
> >
> > On Tue, Aug 5, 2008 at 4:58 PM, Ning Li <[EMAIL PROTECTED]> wrote:
> >
> > > We have been working on supporting Lucene-based index in HBase.
> > > In a nutshell, we extend the region to support indexing on column(s).
> > >
> > > We have a working implementation of our design. An overview of our
> > > design and the preliminary performance evaluation is provided below.
> > > We welcome feedback and we would be happy to contribute the code
> > > to HBase once the major performance issue is resolved.
> > >
> > > DATA MODEL
> > > An index can be created for a column, a column family or all the
> > > columns. In the implementation, we extend the HRegion class so that
> > > it not only manages store files which stores the column values of a
> > > region, but also Lucene instances which are used to support indexing
> > > on columns.
> > >
> > > The following assumes a per-column index and in the end we'll briefly
> > > describe how per-column family index and all-column index work.
> > >
> > > UPDATING A COLUMN
> > > Upon receiving a column update request, a region not only adds the
> > > column to the cache part of the store, but also analyzes the column
> > > and adds it to the cache part of the index. Same as the store files,
> > > the Lucene index files are also written to HDFS.
> > >
> > > Following the HBase design, to avoid resource contention, a region
> > > server globally schedules the cache flush and the compaction of both
> > > the store files and the index files of all the regions on the server.
> > >
> > > QUERYING AN INDEX
> > > We add to HTable the following method to enable querying an index.
> > >    Results search(range, column, query, max_num_hits);
> > > Depending on the specified key range, a client sends a search request
> > > to one or more region servers, who call the search method of queried
> > > regions. The client will merge the results from all the queried
> regions.
> > >
> > > In the current implementation, queries are conducted on the index files
> > > stored in HDFS.
> > >
> > > SPLITTING A REGION
> > > The region split works the same way as before - in addition to creating
> > > reference files for store files, reference files are also created for
> index
> > > files in the child regions. The old parent region will be deleted once
> > > all the reference files are deleted.
> > >
> > > PERFORMANCE ISSUES
> > > Our preliminary performance experiments show that the performance
> > > of building an index is quite reasonable. However, the performance of
> > > random reads in HDFS is so poor that the search performance is
> > > dramatically worse than that on local file systems.
> > >
> > > We are exploring different ways to solve this problem. One possibility
> > > is to store a copy on local file system. On the other hand, most likely
> > > HDFS already stores a local copy...
> > >
> > > VARIATIONS
> > > As we mentioned earlier, an index can also be created for a column
> > > family or for all the columns. If an index is created for a column
> family,
> > > whenever a column is updated, the rest of the column family needs to
> > > be retrieved to re-index the column family. This adds some overhead
> > > to the indexing process. Also, it's open what the best versioning
> > > semantics is.
> > >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > [EMAIL PROTECTED]
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Multi get/put

Reply via email to