Hi, Not sure about others, but this didn't come through as a table....
For Solr vs. ES comparison see http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/ Would be cool to see something like that for Blur! Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Fri, Dec 13, 2013 at 7:35 AM, Naresh Yadav <[email protected]> wrote: > Hi, > > I am little new to these technologies, may not be at right stage to answer > these questions but i had read a lot for comparing > these technologies and i figured out this comparison table based on initial > understanding : > > > > *BLUR * > > > > > > > > > > *ElasticSearch* > > Supports Lucene over HDFS > > > > > > > > > > Yes > > > > > > > > > > Yes > > Dynamic Columns Indexing > > > > > > > > > > Yes > > > > > > > > > > Yes > > Internally uses MapReduce to store/update index > > > > > > > > > > Yes > > > > > > > > > > No > > Index Storage many options > > > > > > > > > > Only FileSystem/HDFS > > > > > > > > > > Many Options > > In memory Indexing > > > > > > > > > > No > > > > > > > > > > Yes > > HDFS lacks page cache so build own > > > > > > > > > > Yes have concept of BlockCache > > > > > > > > > > No > > WriteAhead Log for Indexes > > > > > > > > > > Yes > > > > > > > > > > No > > I may be wrong in understanding of few of these as i had just read about > these, not actually used them in real problem. > About Solr this has used BLUR code for integration with HDFS and do not > support MapReduce to store/update indexes. > > Thanks > Naresh > > On Tue, Dec 10, 2013 at 1:45 AM, Aaron McCurry <[email protected]> wrote: > > > On Sun, Dec 8, 2013 at 10:32 PM, Otis Gospodnetic < > > [email protected]> wrote: > > > > > Thanks for the info about other distributed FSs being an option. I'd > > guess > > > relying on the distributed FS is nice for any very large deployment, > but > > I > > > wonder if that requirement is hinderance for any small to medium sized > > > deployment that needs more than 1 shard server, but doesn't quite want > > the > > > whole dist FS machinery. > > > > > > What's your experience? > > > > > > > I don't see running the HDFS part of Hadoop very hard to do, MapReduce > > might be overkill for some people though. > > > > > > > > > > Distributed trace sounds nice and useful! Is it exposed via JMX or > some > > > other API? I'd want us to capture that with SPM once we add support > for > > > Blur monitoring to SPM. > > > > > > > All the trace information is available through the standard Thrift API in > > Blur. And there's a pluggable API for how the traces are stored, current > > implementations are in ZooKeeper and HDFS, as well as just logging the > > info. > > > > Aaron > > > > > > > > > > Otis > > > -- > > > Performance Monitoring * Log Analytics * Search Analytics > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <[email protected]> > > wrote: > > > > > > > On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic < > > > > [email protected] > > > > > wrote: > > > > > > > > > Thanks Aaron for this info. This sounds very similar to both > > > > Solr/ES..... > > > > > from this description I can't really see any significant > difference. > > > > > Perhaps the main difference is that with Solr/ES > > Hadoop/HDFS/MapReduce > > > > is > > > > > something that's optional and that most people do not (need to) > use, > > > > while > > > > > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and > you > > > > can't > > > > > have Blur without them. > > > > > > > > > > > > > While I haven't ever run Blur without HDFS. Technically you could > run > > > any > > > > distributed file system with Blur, but a distributed FS is required > if > > > you > > > > want to go beyond 1 shard server. > > > > > > > > MapReduce is not required, only a distributed FS and ZooKeeper. > > > > > > > > > > > > > > > > > > What is distributed tracing? I can't map that to anything in > > Solr/ES. > > > > > > > > > > > > > It allows the client to start a trace of the request(s) they make. > It > > > > propagates through the entire stack gathering timing around all the > > > > traceable sections of code. It also traverses threads and network > > calls. > > > > It helps to explain where the time goes for a given request. There > is > > > > also a display for the trace built into the status pages of Blur. > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > Thanks, > > > > > Otis > > > > > -- > > > > > Performance Monitoring * Log Analytics * Search Analytics > > > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > > > > > > > > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <[email protected]> > > > > wrote: > > > > > > > > > > > Hi James, > > > > > > > > > > > > Thanks for your interest and questions, I will attempt to answer > > your > > > > > > questions below. > > > > > > > > > > > > > > > > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Hi Aaron, I'm wondering if you can talk a little about how you > > Blur > > > > > > > differentiating itself from ElasticSearch and Solr. It seems > like > > > > both > > > > > of > > > > > > > them, in particular Solr after picking up some Blur code, are > > > gaining > > > > > > more > > > > > > > abilities to interact with hadoop and HDFS. > > > > > > > > > > > > > > > > > > > Unfortunately I'm not an expert in Solr or ElasticSearch. I tell > > you > > > > > that > > > > > > Blur's high level features when talking about how it's interacts > > with > > > > > > Hadoop. > > > > > > > > > > > > - Index storage (The obvious one) > > > > > > - Bulk offline indexing, with incremental updates. > > > > > > This one gives you the ability to perform indexing on a dedicated > > > > > MapReduce > > > > > > cluster and simply move the index updates to the running Blur > > cluster > > > > for > > > > > > importing. > > > > > > - WAL (write ahead log) is written to use HDFS > > > > > > - Also we are currently moving most of the meta data from > ZooKeeper > > > > > storage > > > > > > to HDFS storage. This makes interacting with the meta data of a > > > table > > > > > easy > > > > > > to do form within MapReduce jobs > > > > > > > > > > > > > > > > > > > > > > > > > How does a blur install differ from a solr setup reading off > > hdfs? > > > > > > > > > > > > > > > > > > > Again I'm not an expert in Solr. Blur's setup runs a cluster of > > > shard > > > > > > servers that serve shards (indexes) of the table within that > shard > > > > > cluster. > > > > > > The indexes are stored once in HDFS (not counting the HDFS > > > replication > > > > > > here) and evenly distributed across whatever shard servers are > > > online. > > > > > > Blur utilizes a BlockCache (think file system cache) that is an > > > > off-heap > > > > > > based system. The first version of this was originally picked up > > by > > > > > > Cloudera and modified (I'm assuming) and committed back into the > > > > > > Lucene/Solr code base. The second version of this block cache > > (Blur > > > > > 0.2.2 > > > > > > stable) is now the default in Blur. It has several advantages of > > the > > > > > first > > > > > > version: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==gmhqry4axp...@mail.gmail.com%3E > > > > > > > > > > > > One interesting feature of Blur is the ability to run a cluster > of > > > > > > controllers (controllers are used to make the shard cluster look > > > like a > > > > > > single service) in front multiple shard clusters. This can help > to > > > > deal > > > > > > with reindexes of data, meaning that you can reindex all your > index > > > to > > > > a > > > > > > new cluster and not effect performance of the cluster that your > > users > > > > may > > > > > > be interacting with. > > > > > > > > > > > > > > > > > > Some of the overall features of Blur are: > > > > > > - NRT updates of data > > > > > > - Offline bulk indexing > > > > > > - Block cache for fast query performance > > > > > > - Index warmup (pulls parts of the index up into block cache > when a > > > > > segment > > > > > > is brought online) > > > > > > - Performance metrics gathering > > > > > > - Distributed tracing > > > > > > - Custom index types > > > > > > - Custom server side logic can be implemented (basic) > > > > > > > > > > > > I'm sure there are many more. > > > > > > > > > > > > Hope this helps. > > > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks > > > > > > > > > > > > > > James > > > > > > > > > > > > > > > > > > > > > > > > > > > >
