Code example for textmining.org library:
FileInputStream in = new FileInputStream ("test.doc");
WordExtractor extractor = new WordExtractor();
String str = extractor.extractText();
----- Original Message -----
From: "Natarajan.T" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Tuesday, August 24, 2004 8:11 AM
Subject: RE: worddoucments search
> Hi Santhosh,
>
> Try out the below attached code.....(POI.jar should be in your class
> path)
>
>
> public String getContent(InputStream reader) throws IOException {
> ArrayList text = new ArrayList();
> POIFSFileSystem fsys = new POIFSFileSystem(reader);
>
> DocumentEntry headerProps =
> (DocumentEntry)fsys.getRoot().getEntry("WordDocument");
> DocumentInputStream din =
> fsys.createDocumentInputStream("WordDocument");
> byte[] header = new byte[headerProps.getSize()];
>
> din.read(header);
> din.close();
>
> //Get the information we need from the header
> int info = LittleEndian.getShort(header, 0xa);
> boolean useTable1 = (info & 0x200) != 0;
>
> //get the location of the piece table
> int complexOffset = LittleEndian.getInt(header,
> 0x1a2);
>
> String tableName = null;
> if (useTable1) {
> tableName = "1Table";
> }
> else{
> tableName = "0Table";
> }
>
> DocumentEntry table =
> (DocumentEntry)fsys.getRoot().getEntry(tableName);
> byte[] tableStream = new byte[table.getSize()];
> din = fsys.createDocumentInputStream(tableName);
> din.read(tableStream);
> din.close();
>
> din = null;
> fsys = null;
> table = null;
> headerProps = null;
>
> int multiple = findText(tableStream, complexOffset,
> text);
>
> StringBuffer sb = new StringBuffer();
> int size = text.size();
> tableStream = null;
>
> WordTextPiece nextPiece = null;
> int start ;
> int length;
> String toStr = "";
> for (int x = 0; x < size; x++) {
> nextPiece = (WordTextPiece)text.get(x);
> start = nextPiece.getStart();
> length = nextPiece.getLength();
>
> boolean unicode =
> nextPiece.usesUnicode();
> if (unicode) {
> toStr = new String(header,
> start, length * multiple, "UTF-16LE");
> }
> else{
> toStr = new String(header,
> start, length , "ISO-8859-1");
> }
>
> }
>
> reader.close();
> return toStr;
> }
>
>
> Regards,
> Natarajan.
>
>
>
> -----Original Message-----
> From: Santosh [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 24, 2004 5:46 PM
> To: Lucene Users List
> Subject: worddoucments search
>
> Can lucene be able to search word documents? if so please give me
> information about it
>
> regards
> Santosh kumar
>
>
> -----------------------SOFTPRO DISCLAIMER------------------------------
>
> Information contained in this E-MAIL and any attachments are
> confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
> and 'confidential'.
>
> If you are not an intended or authorised recipient of this E-MAIL or
> have received it in error, You are notified that any use, copying or
> dissemination of the information contained in this E-MAIL in any
> manner whatsoever is strictly prohibited. Please delete it immediately
> and notify the sender by E-MAIL.
>
> In such a case reading, reproducing, printing or further dissemination
> of this E-MAIL is strictly prohibited and may be unlawful.
>
> SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> hereto is free from computer viruses or other defects.
>
> The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> those of the author and are not necessarily those of SOFTPRO SYSTEMS.
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]