In my previous job, I was working with Blur and even presented Blur in Lucene Conference . I tried to integrate Blur with Spark Streaming and presented the same .
http://www.slideshare.net/lucidworks/near-real-time-indexing-kafka-messages-into-apache-blur-presented-by-dibyendu-bhattacharya-pearson-north-america https://www.youtube.com/watch?v=n7lfYhJgtJo Regards, Dibyendu On Fri, Jan 6, 2017 at 4:49 PM, Lukáš Vlček <[email protected]> wrote: > Hi Aaron, > > question regarding the Blur data ingestion focus: > > Do I read it correctly that Blur is not near-real time system (like both ES > and Solr are)? For example would Blur be a valid candidate for aggregated > logging use case? How long does it usually take for indexed data to become > searchable (ms, sec, mins)? > > As for retention of the data what are the strategies to drop old data from > the index? For example is there anything like dropping old data based on > index name patters (given the index name contains timestamp)? > > thanks, > Lukáš > > > > - Massive data ingestion > > > > Basically the focus on ingestion was not on latency but rather having the > > ability to incrementally add large amounts of data to the index that is > > likely also very large on it's own. The project uses Yarn MR for this > and > > it is not a quick way to bring data but if your needs are to index large > > chunks of data incrementally it works very well. Also if a full reindex > > was needed this could done easily as well. Something to point out here > is > > that the MR indexing puts very little strain on the running system to > > perform the updates/reindexes I believe this differs from how ES and Solr > > are implemented. >
