Hi Santhosh,

Try out the below attached code.....(POI.jar should be in your class
path)


public String getContent(InputStream reader) throws IOException {
                    ArrayList text = new ArrayList();
                    POIFSFileSystem fsys = new POIFSFileSystem(reader);

                    DocumentEntry headerProps =
(DocumentEntry)fsys.getRoot().getEntry("WordDocument");
                    DocumentInputStream din =
fsys.createDocumentInputStream("WordDocument");
                    byte[] header = new byte[headerProps.getSize()];

                    din.read(header);
                    din.close();

                    //Get the information we need from the header
                    int info = LittleEndian.getShort(header, 0xa);
                    boolean useTable1 = (info & 0x200) != 0;

                    //get the location of the piece table
                    int complexOffset = LittleEndian.getInt(header,
0x1a2);

                    String tableName = null;
                    if (useTable1) {
                      tableName = "1Table";
                    }
                    else{
                      tableName = "0Table";
                    }

                    DocumentEntry table =
(DocumentEntry)fsys.getRoot().getEntry(tableName);
                    byte[] tableStream = new byte[table.getSize()];
                    din = fsys.createDocumentInputStream(tableName);
                    din.read(tableStream);
                        din.close();

                    din = null;
                    fsys = null;
                    table = null;
                    headerProps = null;

                    int multiple = findText(tableStream, complexOffset,
text);

                    StringBuffer sb = new StringBuffer();
                    int size = text.size();
                    tableStream = null;

                        WordTextPiece nextPiece = null;
                        int start ;
                        int length;
                        String toStr = "";
                        for (int x = 0; x < size; x++) {
                                nextPiece = (WordTextPiece)text.get(x);
                                start = nextPiece.getStart();
                                length = nextPiece.getLength();

                                boolean unicode =
nextPiece.usesUnicode();
                                if (unicode) {
                                        toStr = new String(header,
start, length * multiple, "UTF-16LE"); 
                                }
                                else{ 
                                        toStr = new String(header,
start, length , "ISO-8859-1"); 
                                } 
                                
                        }

                        reader.close();
                        return toStr;
                }


Regards,
Natarajan.



-----Original Message-----
From: Santosh [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 24, 2004 5:46 PM
To: Lucene Users List
Subject: worddoucments search

Can lucene be able to search word documents? if so please give me
information about it

regards
Santosh kumar


-----------------------SOFTPRO DISCLAIMER------------------------------

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.
------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to