Thanks Jason. I want to look at the solr plugin and see where we can collaborate or if we already duplicated part of the effort.
I still need to push a few commits. I will share the code once I get these changes pushed. - Rahul On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse <[email protected]> wrote: > Hey Rahul, > > This is really cool! Thanks for all of the time you put into writing this, > I think we have a lot of available opportunities to reach new communities > with efforts like this. > > I noticed last week another contributor opened a JIRA for a solr plugin, > there might be a good opportunity for the two of you to join efforts, as I > believe he likely stated working on a lucene reader as part of his solr > work. > > Would you like to post a link to your work on Github or another public host > of your code? > > https://issues.apache.org/jira/browse/DRILL-3585 > > On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter <[email protected]> > wrote: > > > Hi, > > > > I'm pretty new around here but I just wanted to tell you how much your > work > > can benefit us. This is great!. > > > > Look forward to trying it out. > > > > Regards, > > -Stefán > > > > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < > > [email protected]> wrote: > > > > > Hello Drillers, > > > > > > I have been working on a lucene format plugin. In its current state, > the > > > below sample query successfully searches a lucene index and returns the > > > results. > > > > > > select path from dfs_test.`/search-index` where > > contents='maxItemsPerBlock' > > > and contents = 'BlockTreeTermsIndex' > > > > > > > > > > > > *High Level Overview of Current Implementation:* > > > > > > *Parallelization:* A lucene segment is the lowest level of > > > parrallelization. > > > *Filter Pushdown:* Currently the format plugin is designed to push the > > > complete filter into the scan. > > > *Filter Evaluation:* Each condition in the filter is treated as a > lucene > > > TermQuery > > > < > > > > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html > > > > > > > and multiple conditions are joined using a BooleanQuery > > > < > > > > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html > > > >. > > > If we *do not* use a TermQuery, then we have to know the exact type of > > > Analyzer > > > < > > > > > > https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html > > > > > > > to use with each field in the query. > > > Ex: 'contents' field might have been analyzed using a > > StandardAnalyzer > > > < > > > > > > https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html > > > > > > > and the 'path' field might not have been analyzed at all. > > > If desired, support for raw lucene queries with a reserved word should > be > > > easy to add. > > > Ex: select * from dfs.`search-index` where searchQuery = > > > "+contents:maxItemsPerBlock > > > +path:/home/file.txt"; > > > *Converting SqlFilter to Lucene Query:* Currently only "=" and "!=" > > > operators are handled while converting a sql filter into a lucene > query. > > > For indexed fields this might be sufficient to handle a good number of > > > cases. For non-indexed fields operators like ">,<, like etc" need to be > > > handled. > > > *FileSystems:* Currently the format plugin only works on a local > > > filesystem. > > > > > > > > > Though far from complete, I want to work with the community to get some > > > feedback and avoid any chance of duplication of work. Kindly let me > know > > > your thoughts > > > > > > - Rahul > > > > > >
