Re: Contrast of Blur to ElasticSearch, Solr

Otis Gospodnetic Fri, 13 Dec 2013 06:45:50 -0800

Hi,

Not sure about others, but this didn't come through as a table....


For Solr vs. ES comparison see
http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/
Would be cool to see something like that for Blur!

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Dec 13, 2013 at 7:35 AM, Naresh Yadav <[email protected]> wrote:

> Hi,
>
> I am little new to these technologies, may not be at right stage to answer
> these questions but i had read a lot for comparing
> these technologies and i figured out this comparison table based on initial
> understanding :
>
>
>
> *BLUR *
>
>
>
>
>
>
>
>
>
> *ElasticSearch*
>
> Supports Lucene over HDFS
>
>
>
>
>
>
>
>
>
> Yes
>
>
>
>
>
>
>
>
>
> Yes
>
> Dynamic Columns Indexing
>
>
>
>
>
>
>
>
>
> Yes
>
>
>
>
>
>
>
>
>
> Yes
>
> Internally uses MapReduce to store/update index
>
>
>
>
>
>
>
>
>
> Yes
>
>
>
>
>
>
>
>
>
> No
>
> Index Storage many options
>
>
>
>
>
>
>
>
>
> Only FileSystem/HDFS
>
>
>
>
>
>
>
>
>
> Many Options
>
> In memory Indexing
>
>
>
>
>
>
>
>
>
> No
>
>
>
>
>
>
>
>
>
> Yes
>
> HDFS lacks page cache so build own
>
>
>
>
>
>
>
>
>
> Yes have concept of BlockCache
>
>
>
>
>
>
>
>
>
> No
>
> WriteAhead Log for Indexes
>
>
>
>
>
>
>
>
>
> Yes
>
>
>
>
>
>
>
>
>
> No
>
> I may be wrong in understanding of few of these as i had just read about
> these, not actually used them in real problem.
> About Solr this has used BLUR code for integration with HDFS and do not
> support MapReduce to store/update indexes.
>
> Thanks
> Naresh
>
> On Tue, Dec 10, 2013 at 1:45 AM, Aaron McCurry <[email protected]> wrote:
>
> > On Sun, Dec 8, 2013 at 10:32 PM, Otis Gospodnetic <
> > [email protected]> wrote:
> >
> > > Thanks for the info about other distributed FSs being an option.  I'd
> > guess
> > > relying on the distributed FS is nice for any very large deployment,
> but
> > I
> > > wonder if that requirement is hinderance for any small to medium sized
> > > deployment that needs more than 1 shard server, but doesn't quite want
> > the
> > > whole dist FS machinery.
> > >
> > > What's your experience?
> > >
> >
> > I don't see running the HDFS part of Hadoop very hard to do, MapReduce
> > might be overkill for some people though.
> >
> >
> > >
> > > Distributed trace sounds nice and useful!  Is it exposed via JMX or
> some
> > > other API?  I'd want us to capture that with SPM once we add support
> for
> > > Blur monitoring to SPM.
> > >
> >
> > All the trace information is available through the standard Thrift API in
> > Blur.  And there's a pluggable API for how the traces are stored, current
> > implementations are in ZooKeeper and HDFS, as well as just logging the
> > info.
> >
> > Aaron
> >
> >
> > >
> > > Otis
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <[email protected]>
> > wrote:
> > >
> > > > On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic <
> > > > [email protected]
> > > > > wrote:
> > > >
> > > > > Thanks Aaron for this info.  This sounds very similar to both
> > > > Solr/ES.....
> > > > > from this description I can't really see any significant
> difference.
> > > > >  Perhaps the main difference is that with Solr/ES
> > Hadoop/HDFS/MapReduce
> > > > is
> > > > > something that's optional and that most people do not (need to)
> use,
> > > > while
> > > > > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and
> you
> > > > can't
> > > > > have Blur without them.
> > > > >
> > > >
> > > > While I haven't ever run Blur without HDFS.  Technically you could
> run
> > > any
> > > > distributed file system with Blur, but a distributed FS is required
> if
> > > you
> > > > want to go beyond 1 shard server.
> > > >
> > > > MapReduce is not required, only a distributed FS and ZooKeeper.
> > > >
> > > >
> > > > >
> > > > > What is distributed tracing?  I can't map that to anything in
> > Solr/ES.
> > > > >
> > > >
> > > > It allows the client to start a trace of the request(s) they make.
>  It
> > > > propagates through the entire stack gathering timing around all the
> > > > traceable sections of code.  It also traverses threads and network
> > calls.
> > > >  It helps to explain where the time goes for a given request.  There
> is
> > > > also a display for the trace built into the status pages of Blur.
> > > >
> > > > Aaron
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Otis
> > > > > --
> > > > > Performance Monitoring * Log Analytics * Search Analytics
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi James,
> > > > > >
> > > > > > Thanks for your interest and questions, I will attempt to answer
> > your
> > > > > > questions below.
> > > > > >
> > > > > >
> > > > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Aaron, I'm wondering if you can talk a little about how you
> > Blur
> > > > > > > differentiating itself from ElasticSearch and Solr. It seems
> like
> > > > both
> > > > > of
> > > > > > > them, in particular Solr after picking up some Blur code, are
> > > gaining
> > > > > > more
> > > > > > > abilities to interact with hadoop and HDFS.
> > > > > > >
> > > > > >
> > > > > > Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell
> > you
> > > > > that
> > > > > > Blur's high level features when talking about how it's interacts
> > with
> > > > > > Hadoop.
> > > > > >
> > > > > > - Index storage (The obvious one)
> > > > > > - Bulk offline indexing, with incremental updates.
> > > > > > This one gives you the ability to perform indexing on a dedicated
> > > > > MapReduce
> > > > > > cluster and simply move the index updates to the running Blur
> > cluster
> > > > for
> > > > > > importing.
> > > > > > - WAL (write ahead log) is written to use HDFS
> > > > > > - Also we are currently moving most of the meta data from
> ZooKeeper
> > > > > storage
> > > > > > to HDFS storage.  This makes interacting with the meta data of a
> > > table
> > > > > easy
> > > > > > to do form within MapReduce jobs
> > > > > >
> > > > > >
> > > > > >
> > > > > > > How does a blur install differ from a solr setup reading off
> > hdfs?
> > > > > > >
> > > > > >
> > > > > > Again I'm not an expert in Solr.  Blur's setup runs a cluster of
> > > shard
> > > > > > servers that serve shards (indexes) of the table within that
> shard
> > > > > cluster.
> > > > > >  The indexes are stored once in HDFS (not counting the HDFS
> > > replication
> > > > > > here) and evenly distributed across whatever shard servers are
> > > online.
> > > > > >  Blur utilizes a BlockCache (think file system cache) that is an
> > > > off-heap
> > > > > > based system.  The first version of this was originally picked up
> > by
> > > > > > Cloudera and modified (I'm assuming) and committed back into the
> > > > > > Lucene/Solr code base.  The second version of this block cache
> > (Blur
> > > > > 0.2.2
> > > > > > stable) is now the default in Blur.  It has several advantages of
> > the
> > > > > first
> > > > > > version:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==gmhqry4axp...@mail.gmail.com%3E
> > > > > >
> > > > > > One interesting feature of Blur is the ability to run a cluster
> of
> > > > > > controllers (controllers are used to make the shard cluster look
> > > like a
> > > > > > single service) in front multiple shard clusters.  This can help
> to
> > > > deal
> > > > > > with reindexes of data, meaning that you can reindex all your
> index
> > > to
> > > > a
> > > > > > new cluster and not effect performance of the cluster that your
> > users
> > > > may
> > > > > > be interacting with.
> > > > > >
> > > > > >
> > > > > > Some of the overall features of Blur are:
> > > > > > - NRT updates of data
> > > > > > - Offline bulk indexing
> > > > > > - Block cache for fast query performance
> > > > > > - Index warmup (pulls parts of the index up into block cache
> when a
> > > > > segment
> > > > > > is brought online)
> > > > > > - Performance metrics gathering
> > > > > > - Distributed tracing
> > > > > > - Custom index types
> > > > > > - Custom server side logic can be implemented (basic)
> > > > > >
> > > > > > I'm sure there are many more.
> > > > > >
> > > > > > Hope this helps.
> > > > > >
> > > > > > Aaron
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > thanks
> > > > > > >
> > > > > > > James
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Contrast of Blur to ElasticSearch, Solr

Reply via email to