Daniel Knapp
Thu, 26 Nov 2009 04:08:35 -0800
>>> With Tika 0.5 you could do something as simple as this: >>> >>> import org.apache.tika.Tika; >>> >>> Reader reader = new Tika().parse(file); >>> >>> You can then read the parse result incrementally from the reader >>> object, or pass the reader for example to a Lucene Document for >>> indexing. >> >> I've read about that. But i don't know how to check when the end of a >> file is reached and merge the result with the related Metadata. > > You could also do the following: > > Metadata metadata = new Metadata(); > metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName()); > Reader reader = > new Tika().parse(new FileInputStream(file), metadata); > > Most of the extracted metadata will be available as soon as the > parse() method returns so you don't need to wait until you've read the > entire stream first. > > The read() methods of the reader will return -1 when you've reached > the end of the file. Note also that unlike with the Parser.parse() > call, the InputStream you pass to Tika.parse() will get closed when > you call the close() method on the returned Reader.
Okay, thanks for that hint. But should i deal with the extracted content?
Actually i'm using the following Code:
Reader read = tik.parse(input, md);
System.out.println(o + " - " + file.getAbsolutePath());
System.out.println("Content-Type: " + md.get("Content-Type")
+ "\n");
BufferedReader br = new BufferedReader(read);
String tmp = "";
StringBuilder sb = new StringBuilder();
while ((tmp = br.readLine()) != null) {
sb.append(tmp);
}
System.out.print(sb.toString());
br.close();
read.close();
But think this can't be the best solution, my memory gets fuller and fuller. Is
this so exotic or do i overlook a detail?
>
> BR,
>
> Jukka Zitting
smime.p7s
Description: S/MIME cryptographic signature