Hello I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that the text can be easily indexed for word frequencies.
I need to get the text from the title and description elements of RSS items. Ideally, for each hourly retrieval from a given feed, I would add a row to a table in a dataset made up of the following columns: feed_url, title_element_text, description_element_text, polling_date_time >From this, I can look up any element in a feed and calculate keyword >frequencies based upon the length of time required. This can be done as a database table and hashmaps used to calculate word frequencies. But can I do this in Lucene to this degree of granularity at all? If so, would each feed form a Lucene document or would each 'row' from the database table form one? Can anyone advise? Thanks Martin O'Shea. -- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
