We have a number of internal systems here (content mgmt, bug db, support email, CRM), all of which are PHP/MySQL combos - and in all cases Lucene is used for the indexing and we have never seen any reason to go to XML as in intermediate step. We've been at this for 6 months or so. Only hassle is that if the group that's doing the PHP/MySQL tweaks the schema, they have to remember to modify the Lucene indexer so that, say, it picks up the new columns - but there's no way around this unless you want to be very generic, in which case xml still doesn't give you anything since you could just as well use JDBC meta-data to get all columns...
-----Original Message----- From: Michael Caughey [mailto:michael@;caughey.com] Sent: Friday, November 08, 2002 4:21 PM To: Spencer, Dave; Lucene Users List Subject: Re: Indexing Db Table -- Better way request Converting straight to a document seemed to me the best answer as I started to investigate. Somewhere along the line I thought I remembered seeing a suggestion that it was for some reason better to convert to XML and then add it as an XML document. I'd rather not have the hassel of creating then later parsing the XML. I could not find the reference again. This in part was what I was hoping to hear. Thanks, Michael ----- Original Message ----- From: "Spencer, Dave" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, November 08, 2002 6:59 PM Subject: RE: Indexing Db Table -- Better way request One small comment: what's the point of converting a row to XML? What I think you want to do is convert a row to a Document and then pass that off to IndexWriter. -----Original Message----- From: Caughey, Michael [mailto:mcaughey@;trigon.com] Sent: Friday, November 08, 2002 2:22 PM To: '[EMAIL PROTECTED]' Cc: '[EMAIL PROTECTED]' Subject: Indexing Db Table -- Better way request Hello, I'm new to Lucene and this group, if it is improper to send such a message to this group I apologize. I tried to do a reasonable amount of up front research before coming here. I'm about to undertake a piece of my project where I've decided that Lucene will be of use. I have been researching, over the past two week's, ways to accomplish this. I know I'll use an indexWriter to write the index to a file, but I'm having difficultly settling on how to process the data to be indexed. What I have is a table in a MySQL database called items. I want to be able to search on a couple of fields and have it return the ID: Fields: ========= Name VARCHAR (80) Description TEXT Location VARCHAR (80) Qty int ExpireDate Long YYYYMMDD Category int ListingPrice FLOAT(9,2) Supplier int Return ========= ItemId int On start up of the application every row in the database will be read. After that I need to keep the table and the index in sync. Data in the columns can change, rows can be added and removed. I have a centeral entity controller which is responsible for all access to that table. I figured on approach which would work would be on start up to read each row and build an XML document and submit it to the IndexWriter. As Inserts, Deletes and updates occurred I could modify both lucene and the database. Seems simple enough, and may be the only way to handle it. Before I did it I wanted to make sure that there wasn't a better way. Are there documents which can automatically read the table and build a document? Should I read the row and just build fields and construct a document? Does anyone see any problems with storing it in memory versus writing it to a file? Or should I say at point would you consider writing it to a file, would you base that on total document size? I feel that a file index will most likely be just fine. Thanks in advance for any suggestions. Michael Caughey -- To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>
