Re: Wild card search not working

Douglas Kunzma Mon, 30 Nov 2015 07:18:13 -0800

All -

My document has multiple occurrences of the word quartz, but using
totalTermFrequency summing for all terms appears to be working.  Is this
correct?


If so thanks for your help.  I was really stuck.  I owe everyone a beer!
Doug

On Mon, Nov 30, 2015 at 7:42 AM, Allison, Timothy B. <talli...@mitre.org>
wrote:

> I'm getting this (with a single document that contains the word 'quartz':
>
> Term freq(indexReader.totalTermFreq(term))=0
> Term freq(indexReader.getSumTotalTermFreq("Doc"))=1
> totalHits = 1
> termStatics=0
>
> Is this what you're getting?  So...the search is working, but the term
> counts aren't returning what you'd expect?
>
>
> -----Original Message-----
> From: Douglas Kunzma [mailto:dkunzman...@gmail.com]
> Sent: Monday, November 30, 2015 6:59 AM
> To: java-user@lucene.apache.org
> Subject: Wild card search not working
>
> Hi -
>
> I've created a test program where I've been struggling with for  a couple
> of days trying to get wild card searches working in Lucene.  I have some
> Solr experience but this is the first time that I'm working with Lucene.
> I've copied the entire program to github here:
>
> luceneseach2 <https://github.com/ollie70/luceneseach2>/
> *IndexTester.java*
> Please let me know what I'm doing wrong.  I'm using Lucene 5.3.1
>
> I also paste the small program below.  It is self contained.
>
> I've checked all of the usual stuff like using a TextField.
>
> Thanks, Doug
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *import java.io.File;import java.io.IOException;import
> java.nio.charset.Charset;import java.nio.file.FileSystems;import
> java.nio.file.Files;import java.nio.file.Path;import
> java.nio.file.Paths;import java.util.Calendar;import java.util.List;import
> java.util.Vector;import org.apache.commons.lang.mutable.MutableLong;import
> org.apache.log4j.LogManager;import org.apache.log4j.Logger;import
> org.apache.lucene.analysis.Analyzer;import
> org.apache.lucene.analysis.standard.StandardAnalyzer;import
> org.apache.lucene.document.Document;import
> org.apache.lucene.document.Field;import
> org.apache.lucene.document.LongField;import
> org.apache.lucene.document.StringField;import
> org.apache.lucene.document.TextField;import
> org.apache.lucene.index.DirectoryReader;import
> org.apache.lucene.index.IndexReader;import
> org.apache.lucene.index.IndexReaderContext;import
> org.apache.lucene.index.IndexWriter;import
> org.apache.lucene.index.IndexWriterConfig;import
> org.apache.lucene.index.Term;import
> org.apache.lucene.index.TermContext;import
> org.apache.lucene.queryparser.classic.ParseException;import
> org.apache.lucene.search.IndexSearcher;import
> org.apache.lucene.search.Query;import
> org.apache.lucene.search.ScoreDoc;import
> org.apache.lucene.search.TermQuery;import
> org.apache.lucene.search.TermStatistics;import
> org.apache.lucene.search.TopDocs;import
> org.apache.lucene.search.WildcardQuery;import
> org.apache.lucene.search.spans.SpanMultiTermQueryWrapper;import
> org.apache.lucene.search.spans.SpanQuery;import
> org.apache.lucene.store.Directory;import
> org.apache.lucene.store.FSDirectory;public class IndexTester {    private
> static Path path = FileSystems.getDefault().getPath("C:\\temp\\tester");
> private static Analyzer analyzer = new StandardAnalyzer();        final
> static Logger log = LogManager.getLogger(IndexTester.class);    public
> static void main(String args[]) throws IOException, ParseException {
> Directory idx = FSDirectory.open(path);
> index("C:\\temp\\test_index");        Term term = new Term("Doc",
> "quart?"); // must be lower case.        WildcardQuery wc = new
> WildcardQuery(term);        SpanQuery spanTerm = new
> SpanMultiTermQueryWrapper<WildcardQuery>(wc);        IndexReader
> indexReader = DirectoryReader.open(idx); System.out.println("Term freq=" +
> indexReader.totalTermFreq(term)); System.out.println("Term freq=" +
> indexReader.getSumTotalTermFreq("Doc"));        IndexSearcher isearcher =
> new IndexSearcher(indexReader);         TopDocs docs =
> isearcher.search(spanTerm, 1);        System.out.println("totalHits = " +
> docs.totalHits);        IndexReaderContext indexReaderContext =
> isearcher.getTopReaderContext();        TermContext context =
> TermContext.build(indexReaderContext, term);        TermStatistics
> termStatistics = isearcher.termStatistics(term, context);
> System.out.println("termStatics=" + termStatistics.totalTermFreq());
> }    public static List<String> query(Query query, MutableLong totalHits,
> MutableLong totalDocs)            throws IOException {        List<String>
> files = new Vector<String>();        Directory idx =
> FSDirectory.open(path);        DirectoryReader indexReader =
> DirectoryReader.open(idx);        IndexSearcher isearcher = new
> IndexSearcher(indexReader);        TopDocs topDocs =
> isearcher.search(query, 100);        ScoreDoc[] top =
> topDocs.scoreDocs;        System.out.println(topDocs.totalHits);
> totalHits.setValue(topDocs.totalHits);
> totalDocs.setValue(top.length);        log.trace("top length" +
> top.length);        for (int i = 0; i < top.length; i++) {            int
> docID = top[i].doc;            Document doc =
> isearcher.doc(docID);            Path path =
> Paths.get(doc.get("Path"));            String fileName =
> path.getFileName().toString();            log.trace("match fileName =" +
> fileName);            files.add(fileName);        }
> indexReader.close();        idx.close();        return files;    }
> public static void index(String dir) throws IOException {
> IndexWriterConfig config = new IndexWriterConfig(analyzer);
> config.setCommitOnClose(true);
> config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);        Directory idx
> = FSDirectory.open(path);        IndexWriter indexWriter = new
> IndexWriter(idx, config);        List<File> files = lgDir(dir);        for
> (File f : files) {            log.trace("filename=" +
> f.getName());            //boolean indexExists =
> doesIndexExist();           // log.trace("indexExists" +
> indexExists);            addDoc(indexWriter, f);               }
> indexWriter.commit();        indexWriter.close();        idx.close();
> }    /**     * Add the document to the index.     *      * @param
> writer     * @param filePath     * @throws IOException     */    private
> static void addDoc(IndexWriter writer, File filePath) throws IOException
> {        Document doc = new Document();        //byte[] encoded =
> Files.readAllBytes(Paths.get(filePath.getCanonicalPath()));
> List<String> lines =
> Files.readAllLines(Paths.get(filePath.getCanonicalPath()),
> Charset.forName("Cp1252"));        StringBuffer buf = new
> StringBuffer();       java.util.Iterator<String> iter
> =lines.iterator();       while (iter.hasNext()) {
> buf.append(iter.next());           buf.append("\n");       }
> //String content = new String(encoded, "UTF-8");        //if
> (content.length() > 0) {
> log.trace(filePath.getCanonicalPath());            doc.add(new
> StringField("Path", filePath.getCanonicalPath(),
> Field.Store.YES));            Calendar calendar =
> Calendar.getInstance();
> calendar.setTimeInMillis(filePath.lastModified());            doc.add(new
> LongField("Date", filePath.lastModified(), Field.Store.YES));            //
> log.trace("content size" + content.length());            doc.add(new
> TextField("Doc", buf.toString(), Field.Store.YES));
> writer.addDocument(doc);            writer.commit();       // }    }
> public static List<File> lgDir(String directory) {        File d = new
> File(directory);        File[] f = d.listFiles();        List<File> myList
> = new Vector<File>();        for (File f1 : f) {
> myList.add(f1);        }        log.trace("size count of myList =" +
> myList.size());        return myList;    }}*
>


    Terracotta.org
    Ehcache.org
    Quartz-Scheduler.org

Call us: +1-415-738-4000
Quartz Scheduler

    Job Scheduler

    Forums
    Blog
    Contact Us
    My Account

    Overview
    Community
    Documentation
    News

Download

    Welcome
    Current Documentation
    Release Notes
    Previous Versions
    Other Documents

Contents |  Prev | Next 
Example - Job Parameters and Job State

This example is designed to demonstrate how you can pass run-time parameters 
into quartz jobs and how you can maintain state in a job.

The program will perform the following actions:

    Start up the Quartz Scheduler
    Schedule two jobs, each job will execute the every ten seconds for a total 
of times
    The scheduler will pass a run-time job parameter of "Green" to the first 
job instance
    The scheduler will pass a run-time job parameter of "Red" to the second job 
instance
    The program will wait 60 seconds so that the two jobs have plenty of time 
to run
    Shut down the Scheduler

Running the Example

This example can be executed from the examples/example4 directory. There are 
two out-of-the-box methods for running this example

    example4.sh - A UNIX/Linux shell script
    example4.bat - A Windows Batch file

The Code

The code for this example resides in the package org.quartz.examples.example4.

The code in this example is made up of the following classes:
Class Name      Description
JobStateExample         The main program
ColorJob        A simple job that prints a favorite color (passed in as a 
run-time parameter) and displays its execution count.
ColorJob

ColorJob is a simple class that implement the Job interface, and is annotated 
as such:

@PersistJobDataAfterExecution
@DisallowConcurrentExecution
public class ColorJob implements Job {

The annotations cause behavior just as their names describe - multiple 
instances of the job will not be allowed to run concurrently (consider a case 
where a job has code in its execute() method that takes 34 seconds to run, but 
it is scheduled with a trigger that repeats every 30 seconds), and will have 
its JobDataMap contents re-persisted in the scheduler's JobStore after each 
execution. For the purposes of this example, only @PersistJobDataAfterExecution 
annotation is truly relevant, but it's always wise to use the 
@DisallowConcurrentExecution annotation with it, to prevent race-conditions on 
saved data.

ColorJob logs the following information when the job is executed:

    The job's identification key (name and group) and time/date of execution
    The job's favorite color (which is passed in as a run-time parameter)
    The job's execution count calculated from a member variable
    The job's execution count maintained as a job map parameter

_log.info("ColorJob: " + jobKey + " executing at " + new Date() + "\n" +
    "  favorite color is " + favoriteColor + "\n" + 
    "  execution count (from job map) is " + count + "\n" + 
    "  execution count (from job member variable) is " + _counter);

The variable favoriteColor is passed in as a job parameter. It is retrieved as 
follows from the JobDataMap:

JobDataMap data = context.getJobDetail().getJobDataMap();
String favoriteColor = data.getString(FAVORITE_COLOR);

The variable count is stored in the job data map as well:

JobDataMap data = context.getJobDetail().getJobDataMap();
int count = data.getInt(EXECUTION_COUNT);

The variable is later incremented and stored back into the job data map so that 
job state can be preserved:

count++;
data.put(EXECUTION_COUNT, count);

There is also a member variable named counter. This variable is defined as a 
member variable to the class:

private int _counter = 1;

This variable is also incremented and displayed. However, its count will always 
be displayed as "1" because Quartz will always instantiate a new instance of 
the class during each execution - which prevents member variables from being 
used to maintain state.
JobStateExample

The program starts by getting an instance of the Scheduler. This is done by 
creating a StdSchedulerFactory and then using it to create a scheduler. This 
will create a simple, RAM-based scheduler.

SchedulerFactory sf = new StdSchedulerFactory();
Scheduler sched = sf.getScheduler();

Job #1 is scheduled to run every 10 seconds, for a total of five times:

JobDetail job1 = newJob(ColorJob.class)
    .withIdentity("job1", "group1")
    .build();

SimpleTrigger trigger1 = newTrigger() 
    .withIdentity("trigger1", "group1")
    .startAt(startTime)
    .withSchedule(simpleSchedule()
            .withIntervalInSeconds(10)
            .withRepeatCount(4))
    .build();

Job #1 is passed in two job parameters. One is a favorite color, with a value 
of "Green". The other is an execution count, which is initialized with a value 
of 1.

job1.getJobDataMap().put(ColorJob.FAVORITE_COLOR, "Green");
job1.getJobDataMap().put(ColorJob.EXECUTION_COUNT, 1);

Job #2 is also scheduled to run every 10 seconds, for a total of five times:

JobDetail job2 = newJob(ColorJob.class)
    .withIdentity("job2", "group1")
    .build();

SimpleTrigger trigger2 = newTrigger() 
    .withIdentity("trigger2", "group1")
    .startAt(startTime)
    .withSchedule(simpleSchedule()
            .withIntervalInSeconds(10)
            .withRepeatCount(4))
    .build();

Job #2 is also passed in two job parameters. One is a favorite color, with a 
value of "Red". The other is an execution count, which is initialized with a 
value of 1.

job2.getJobDataMap().put(ColorJob.FAVORITE_COLOR, "Red");
job2.getJobDataMap().put(ColorJob.EXECUTION_COUNT, 1);

The scheduler is then started.

sched.start();

To let the program have an opportunity to run the job, we then sleep for one 
minute (60 seconds)

Thread.sleep(60L * 1000L);

Finally, we will gracefully shutdown the scheduler:

sched.shutdown(true);

Note: passing true into the shutdown message tells the Quartz Scheduler to wait 
until all jobs have completed running before returning from the method call.
Projects

    Ehcache
    Quartz Scheduler

How to get it

    Download Now
    Join the Community
    Sign Up for Training

Follow Us

    LinkedInLinkedin
    FacebookFacebook
    TwitterTwitter

    ©Terracotta, Inc., a wholly-owned subsidiary of Software AG USA, Inc. All 
rights reserved.
    Contact Us |
    Copyright |
    Privacy Policy |
    Legal

        Quartz Test
quartz.test
quartz.build
quartz.building
quartz.stepping
Quartz license
quartz license

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Wild card search not working

Reply via email to