Sorry Mike,
I pasted the old code. I've already included something like this to index
with TermVector:
String xpto = fr.toString();
doc.add(new Field("contents2", xpto,
Field.Store.YES,
Field.Index.ANALYZED,
Field.TermVector.YES));
Probably my approach to my problem isn't the correct, so I explain better
what do i want.
My idea is to have some files, like txt ones, and to get their each
TermVector for each file. I don't know if this can be done by simple
indexing the files.
Thanks Mike. :)
Luis Paiva
-----Mensagem original-----
De: Michael McCandless [mailto:[email protected]]
Enviada: segunda-feira, 2 de Abril de 2012 20:49
Para: [email protected]
Assunto: Re: TVD, TVX and TVF files
As far as I can see, you are not indexing term vectors in the code
below? Your Fields don't have TermVector.*...
Can you boil this down to a small test case showing the missing term
vector files...?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Apr 2, 2012 at 1:28 PM, Luis Paiva <[email protected]>
wrote:
> Thank you for your help.
> I still haven't found a solution yet. I'm copying all my code below.
>
> BTW, I'm working with lucene version 3.5.0
>
> @Mike: Yes i do close it :) I have some files created, that are: .fdt,
.fdx,
> .fnm, .frq, .nrm, .prx, .tii, .tis.
>
> Don't know why the files T* are not created.
>
> @Uwe: I think I'm not getting any compound files. Only those above.
>
> Anyone has the same issue?
>
>
>
> CODE --------------------------- xx -------------------------------
>
>
> package lucene;
>
> import java.io.*;
> import java.util.ArrayList;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
>
> /**
> * This terminal application creates an Apache Lucene index in a folder
and
> adds files into this index
> * based on the input of the user.
> */
> public class TextFileIndexer {
>
> private IndexWriter writer;
> private ArrayList<File> queue = new ArrayList<File>();
>
> public static void main(String[] args) throws IOException {
> System.out.println("Enter the path where the index will be created: ");
>
> BufferedReader br = new BufferedReader(
> new InputStreamReader(System.in));
> String s = br.readLine();
>
> TextFileIndexer indexer = null;
> try {
> indexer = new TextFileIndexer(s);
> } catch (Exception ex) {
> System.out.println("Cannot create index..." + ex.getMessage());
> System.exit(-1);
> }
>
> //===================================================
> //read input from user until he enters q for quit
> //===================================================
> while (!s.equalsIgnoreCase("q")) {
> try {
> System.out.println("Enter the file or folder name to add into the
> index (q=quit):");
> System.out.println("[Acceptable file types: .xml, .html, .html,
> .txt]");
> s = br.readLine();
> if (s.equalsIgnoreCase("q")) {
> break;
> }
>
> //try to add file into the index
> indexer.indexFileOrDirectory(s);
> } catch (Exception e) {
> System.out.println("Error indexing " + s + " : " + e.getMessage());
> }
> }
>
> //===================================================
> //after adding, we always have to call the
> //closeIndex, otherwise the index is not created
> //===================================================
> indexer.closeIndex();
> }
>
> /**
> * Constructor
> * @param indexDir the name of the folder in which the index should be
> created
> * @throws java.io.IOException
> */
> TextFileIndexer(String indexDir) throws IOException {
> // the boolean true parameter means to create a new index everytime,
> // potentially overwriting any existing files there.
> FSDirectory dir = FSDirectory.open(new File(indexDir));
>
> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
>
> IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34,
> analyzer);
>
> writer = new IndexWriter(dir, config);
> }
>
> /**
> * Indexes a file or directory
> * @param fileName the name of a text file or a folder we wish to add to
> the index
> * @throws java.io.IOException
> */
> public void indexFileOrDirectory(String fileName) throws IOException {
> //===================================================
> //gets the list of files in a folder (if user has submitted
> //the name of a folder) or gets a single file name (is user
> //has submitted only the file name)
> //===================================================
> addFiles(new File(fileName));
>
> int originalNumDocs = writer.numDocs();
> for (File f : queue) {
> FileReader fr = null;
> try {
> Document doc = new Document();
>
> //===================================================
> // add contents of file
> //===================================================
> fr = new FileReader(f);
> doc.add(new Field("contents", fr));
>
>
>
> //===================================================
> //adding second field which contains the path of the file
> //===================================================
> doc.add(new Field("path", fileName,
> Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>
> writer.addDocument(doc);
> System.out.println("Added: " + f);
> } catch (Exception e) {
> System.out.println("Could not add: " + f);
> } finally {
> fr.close();
> }
> }
>
> int newNumDocs = writer.numDocs();
> System.out.println("");
> System.out.println("************************");
> System.out.println((newNumDocs - originalNumDocs) + " documents
> added.");
> System.out.println("************************");
>
> queue.clear();
> }
>
> private void addFiles(File file) {
>
> if (!file.exists()) {
> System.out.println(file + " does not exist.");
> }
> if (file.isDirectory()) {
> for (File f : file.listFiles()) {
> addFiles(f);
> }
> } else {
> String filename = file.getName().toLowerCase();
> //===================================================
> // Only index text files
> //===================================================
> if (filename.endsWith(".htm") || filename.endsWith(".html") ||
> filename.endsWith(".xml") || filename.endsWith(".txt")) {
> queue.add(file);
> } else {
> System.out.println("Skipped " + filename);
> }
> }
> }
>
> /**
> * Close the index.
> * @throws java.io.IOException
> */
> public void closeIndex() throws IOException {
> writer.close();
> }
> }
>
> END OF CODE --------------------------- xx -------------------------------
>
>
> -----Mensagem original-----
> De: Uwe Schindler [mailto:[email protected]]
> Enviada: terça-feira, 27 de Março de 2012 19:19
> Para: [email protected]
> Assunto: RE: TVD, TVX and TVF files
>
> Maybe you only see CFS files? If this is the case, your index is in
compound
> file format. In that case (the default), to get the raw files, disable
> compound files in the merge policy!
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:[email protected]]
>> Sent: Tuesday, March 27, 2012 8:13 PM
>> To: [email protected]
>> Subject: Re: TVD, TVX and TVF files
>>
>> The code seems OK on quick glance...
>>
>> Are you closing the writer?
>>
>> Are you hitting any exceptions?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Mar 27, 2012 at 12:19 PM, Luis Paiva <[email protected]>
>> wrote:
>> > Hey all,
>> >
>> > i'm in my first steps in Lucene.
>> > I was trying to index some txt files, and my program doesn't construct
>> > the term vector files. I would need these files. (.tvd, .tvx, .tvf)
>> >
>> > I'm attaching my code so anyone can help me.
>> > Thank you all in advance!
>> >
>> > Sorry if i'm repeating the question, but i couldn't find the answer to
> it.
>> >
>> >
>> > public void indexFileOrDirectory(String fileName) throws IOException {
>> >
>> > addFiles(new File(fileName));
>> >
>> > int originalNumDocs = writer.numDocs();
>> > for (File f : queue) {
>> > FileReader fr = null;
>> > try {
>> > Document doc = new Document();
>> >
>> > fr = new FileReader(f);
>> > doc.add(new Field("contents", fr));
>> >
>> > doc.add(new Field("path", fileName, Field.Store.YES,
>> > Field.Index.NOT_ANALYZED));
>> >
>> > String xpto = "xpto1 xpto2 xpto3";
>> > doc.add(new Field("contents2", xpto, Field.Store.YES,
>> > Field.Index.ANALYZED, Field.TermVector.YES));
>> >
>> > writer.addDocument(doc);
>> > System.out.println("Added: " + f);
>> > } catch (Exception e) {
>> > System.out.println("Could not add: " + f);
>> > } finally {
>> > fr.close();
>> > }
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]