Juan Jose Pablos wrote:
FYI:

Ricardo Beltran wrote:

I've CC'd Ricardo on this reply - please reply all.

...

My questions are: Do you think that Forrest is an appropriate framework
for this purpose? and Do you think that Lucene or
Google will do the job of indexing about (5 GB) of XML
files?

I can't comment with authority on the suitability of Google or Lucene for this as I have no experience. My gut is telling me that this is not the optimal solution.

I do have a project that has around 8Gb of dynamic data being published via the Forrest webapp.

The solution I employed, and one that appears to be working well, was to have the data in an XML enabled database, in this case we used Oracle, but we have successfully used XIndice and eXist in similar, smaller, projects in the past. I wrote a custom generator to retrieve the data from the DBMS.

It should be noted that Cocoon has some database components that can be utilised (there is the results of some early experiments of I did with these components in the whiteboard plugin org.apache.forrest.plugin.Database). The reason I never completed work on this plugin was not a problem with it, but additional requirements that made it easier to build a custom generator (our requests were also dependant on live data from sensor readings over an RS232 port).

The system has now been running for about 3 months and we are very happy with it. Because we are using a Database server as the repository we have all the indexing and optimisation provided by that server. We also have the benefit of a very expressive and mature search language.

Of course, this solution requires that you run the system dynamically. Using Google to index your site would allow you to run statically. Trying to build a static site from 5GB of data would be a wonderful stress test, if you do this please report your findings to us.

Ross

Reply via email to