Search the terms (words, phases, sub-strings, combinations) of the row values. Lucene is an apache project that does document indexing on terms.
v/r Bob Thorman Principal Big Data Engineer AT&T Big Data CoE 2900 W. Plano Parkway Plano, TX 75075 972-658-1714 On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <[email protected]> wrote: >What is meant by lexical search? Lucene style? > >http://www.lucenetutorial.com/lucene-query-syntax.html > >If so, these searches could be prioritized (not all are particularly >useful), and it shouldn't be too hard to come up with recommended >Accumulo approaches for the most important lexical searches. > >On Jul 24, 2014, at 10:44 AM, Donald Miner <[email protected]> wrote: > >> One problem I ran into when thinking about this problem is throughput. >>In >> accumulo, we talk about tens or hundreds of thousands or millions of >> records per second. A lot of these search solutions talk about hundreds >>or >> thousands of documents per second. >> >> This problem that Accumulo is able to outpace just about anything lead >>me >> to think that some sort of microbatch solution might be the best >>choice. If >> you wait for your data to be indexed before moving on to the next >>Accumulo >> insert you can start lagging behind. Basically, you are crippling your >> ingest throughput by making it the slower of the two systems. >> >> It seems like a more microbatch (or batch) approach might be >>worthwhile-- >> what you are trading is your text index lagging behind, but you keep >>your >> ingest throughput in Accumulo. I think Apache Blur does batch parallel >> indexing, which is why I was looking at it for this. >> >> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <[email protected]> >>wrote: >> >>> Yeah I think David's solution is the best. Though I like the idea of >>>having >>> a server side Constraint or hook that puts the updates into the queue. >>> >>> The Cassandra work I had seen actually tightly couples a Cassandra >>>node to >>> a Solr shard. So all the data that exists on that specific node also >>>exists >>> on that specific Solr shard. Would be pretty cool to do the same thing >>>with >>> a tablet server => local Solr shard. >>> >>> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets >>><[email protected]> >>> wrote: >>> >>>> Ingest to a queue. Have two processes subscribe to the queue. One >>>> pushing into Accumulo and the other pushing into SolrCloud. Why >>>> tightly couple the capabilities? >>>> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <[email protected]> >>>> wrote: >>>>> Is there a way to tie into the write process in Accumulo? Maybe just >>> use >>>> an >>>>> Iterator that worked on compaction to send data to blur/solr? I have >>> seen >>>>> something similar in Cassandra, a data hook to save data in Solr. >>>>> >>>>> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <[email protected]> >>> wrote: >>>>> >>>>>> We were trying to do so, but adding visibility while >>>>>>adding/searching >>>>>> documents needs lot more thinking. Adding visibility to core search >>>> engine >>>>>> needs changes to algorithm and that does not make it very scalable. >>>>>> Integration besides granular visibility is very doable. and we had >>> taken >>>>>> inspiration from Solandra. >>>>>> >>>>>> Obviously if we can get it done it adds lot of value. I believe >>>>>>Sqrrl >>>>>> people have already done it, are they thinking to open source it >>>> anytime in >>>>>> future? >>>>>> >>>>>> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner >>>>>><[email protected] >>>> >>>>>> wrote: >>>>>> >>>>>>> We briefly toyed with blur on accumulo but didnt get too far just >>>> because >>>>>>> it was obe. I think that would be cool. >>>>>>> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <[email protected]> >>>> wrote: >>>>>>>> >>>>>>>> It's definitely possible. I remember hearing about someone doing >>>> lucene >>>>>>> on top of Accumulo once, but I don't recall seeing a nice package >>>> with a >>>>>>> bow on top. >>>>>>>> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote: >>>>>>>>> What lexical search package (like lucene/solr) has anyone put on >>>> top >>>>>> of >>>>>>> accumulo? Is this possible or does everyone just index log files >>> and >>>>>>> documents? >>>>>>>>> >>>>>>>>> v/r >>>>>>>>> Bob Thorman >>>>>>>>> Principal Big Data Engineer >>>>>>>>> AT&T Big Data CoE >>>>>>>>> 2900 W. Plano Parkway >>>>>>>>> Plano, TX 75075 >>>>>>>>> 972-658-1714 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> >> >> -- >> >> Donald Miner >> Chief Technology Officer >> ClearEdge IT Solutions, LLC >> Cell: 443 799 7807 >> www.clearedgeit.com >
