=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__LuceneFAQ?=

lucene-cvs Thu, 30 Dec 2004 13:19:06 -0800

   Date: 2004-12-30T13:19:03
   Editor: DanielNaber
   Wiki: Jakarta Lucene Wiki
   Page: LuceneFAQ
   URL: http://wiki.apache.org/jakarta-lucene/LuceneFAQ


   no comment

Change Log:

------------------------------------------------------------------------------
@@ -445,6 +445,17 @@
 See article [http://www-106.ibm.com/developerworks/library/j-lucene/ Parsing, 
indexing, and searching XML with Digester and Lucene].
 
 
+==== How can I index OpenOffice.org files? ====
+
+These files (.sxw, .sxc, etc) are ZIP archives that contain XML files. 
Uncompress
+the file using Java's ZIP support, then parse meta.xml to get title etc.
+and content.xml to get the document's content. Add these to the Lucene index,
+typically using one Lucene field per property.
+
+Note that this applies to OpenOffice.org 1.x, things might change a bit for 
OpenOffice.org
+2.x, but the basic approach will still be the same.
+
+
 ==== How can I index MS-Word documents? ====
 
 In order to index Word documents you need to first parse them to extract text 
that you want to index from them.  Here are some Word parsers that can help you 
with that:

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

=?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_Updated=3A__LuceneFAQ?=

Reply via email to