Yes, you have missed my original request. I need a fast way (i.e. Pre-indexed) to perform lexical searches on row values without using a regex based iterator. I also do not want to duplicate data from the cluster onto a document based strategy that is typically required by packages like Apache Lucene.
v/r Bob Thorman Principal Big Data Engineer AT&T Big Data CoE 2900 W. Plano Parkway Plano, TX 75075 972-658-1714 On 7/24/14, 11:37 AM, "Nehal Mehta" <[email protected]> wrote: >If we have two streams, we would just store data into Accumulo and use it >as backend. What we are/were trying to implement was secure search. So if >user does not have rights to search that cell, user can see other listing >but not one which is inaccessible. By doing so we would add lot more >value. > >Am I missing something? > > >On Thu, Jul 24, 2014 at 12:17 PM, THORMAN, ROBERT D <[email protected]> >wrote: > >> Search the terms (words, phases, sub-strings, combinations) of the row >> values. Lucene is an apache project that does document indexing on >>terms. >> >> v/r >> Bob Thorman >> Principal Big Data Engineer >> AT&T Big Data CoE >> 2900 W. Plano Parkway >> Plano, TX 75075 >> 972-658-1714 >> >> >> >> >> >> >> On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <[email protected]> >> wrote: >> >> >What is meant by lexical search? Lucene style? >> > >> >http://www.lucenetutorial.com/lucene-query-syntax.html >> > >> >If so, these searches could be prioritized (not all are particularly >> >useful), and it shouldn't be too hard to come up with recommended >> >Accumulo approaches for the most important lexical searches. >> > >> >On Jul 24, 2014, at 10:44 AM, Donald Miner <[email protected]> >> wrote: >> > >> >> One problem I ran into when thinking about this problem is >>throughput. >> >>In >> >> accumulo, we talk about tens or hundreds of thousands or millions of >> >> records per second. A lot of these search solutions talk about >>hundreds >> >>or >> >> thousands of documents per second. >> >> >> >> This problem that Accumulo is able to outpace just about anything >>lead >> >>me >> >> to think that some sort of microbatch solution might be the best >> >>choice. If >> >> you wait for your data to be indexed before moving on to the next >> >>Accumulo >> >> insert you can start lagging behind. Basically, you are crippling >>your >> >> ingest throughput by making it the slower of the two systems. >> >> >> >> It seems like a more microbatch (or batch) approach might be >> >>worthwhile-- >> >> what you are trading is your text index lagging behind, but you keep >> >>your >> >> ingest throughput in Accumulo. I think Apache Blur does batch >>parallel >> >> indexing, which is why I was looking at it for this. >> >> >> >> >> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <[email protected]> >> >>wrote: >> >> >> >>> Yeah I think David's solution is the best. Though I like the idea of >> >>>having >> >>> a server side Constraint or hook that puts the updates into the >>queue. >> >>> >> >>> The Cassandra work I had seen actually tightly couples a Cassandra >> >>>node to >> >>> a Solr shard. So all the data that exists on that specific node also >> >>>exists >> >>> on that specific Solr shard. Would be pretty cool to do the same >>thing >> >>>with >> >>> a tablet server => local Solr shard. >> >>> >> >>> >> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets >> >>><[email protected]> >> >>> wrote: >> >>> >> >>>> Ingest to a queue. Have two processes subscribe to the queue. One >> >>>> pushing into Accumulo and the other pushing into SolrCloud. Why >> >>>> tightly couple the capabilities? >> >>>> >> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose >><[email protected]> >> >>>> wrote: >> >>>>> Is there a way to tie into the write process in Accumulo? Maybe >>just >> >>> use >> >>>> an >> >>>>> Iterator that worked on compaction to send data to blur/solr? I >>have >> >>> seen >> >>>>> something similar in Cassandra, a data hook to save data in Solr. >> >>>>> >> >>>>> >> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <[email protected]> >> >>> wrote: >> >>>>> >> >>>>>> We were trying to do so, but adding visibility while >> >>>>>>adding/searching >> >>>>>> documents needs lot more thinking. Adding visibility to core >>search >> >>>> engine >> >>>>>> needs changes to algorithm and that does not make it very >>scalable. >> >>>>>> Integration besides granular visibility is very doable. and we >>had >> >>> taken >> >>>>>> inspiration from Solandra. >> >>>>>> >> >>>>>> Obviously if we can get it done it adds lot of value. I believe >> >>>>>>Sqrrl >> >>>>>> people have already done it, are they thinking to open source it >> >>>> anytime in >> >>>>>> future? >> >>>>>> >> >>>>>> >> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner >> >>>>>><[email protected] >> >>>> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> We briefly toyed with blur on accumulo but didnt get too far >>just >> >>>> because >> >>>>>>> it was obe. I think that would be cool. >> >>>>>>> >> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <[email protected]> >> >>>> wrote: >> >>>>>>>> >> >>>>>>>> It's definitely possible. I remember hearing about someone >>doing >> >>>> lucene >> >>>>>>> on top of Accumulo once, but I don't recall seeing a nice >>package >> >>>> with a >> >>>>>>> bow on top. >> >>>>>>>> >> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote: >> >>>>>>>>> What lexical search package (like lucene/solr) has anyone put >>on >> >>>> top >> >>>>>> of >> >>>>>>> accumulo? Is this possible or does everyone just index log >>files >> >>> and >> >>>>>>> documents? >> >>>>>>>>> >> >>>>>>>>> v/r >> >>>>>>>>> Bob Thorman >> >>>>>>>>> Principal Big Data Engineer >> >>>>>>>>> AT&T Big Data CoE >> >>>>>>>>> 2900 W. Plano Parkway >> >>>>>>>>> Plano, TX 75075 >> >>>>>>>>> 972-658-1714 >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>> >> >>>>>> >> >>>> >> >>> >> >> >> >> >> >> >> >> -- >> >> >> >> Donald Miner >> >> Chief Technology Officer >> >> ClearEdge IT Solutions, LLC >> >> Cell: 443 799 7807 >> >> www.clearedgeit.com >> > >> >>
