Hello BaseX list,

I'm completely new to BaseX and a bit overwhelmed of the resources found so far 
in the wiki.
So, please forgive my ask for advices to novices.

My question:
Is BaseX capable of handling TEI-XML files under following circumstances.
  # of TEI-files: ~10^7
  # of directories where these are files stored in: ~10^5
  # of words in TEI/body to be indexed: ~5*10^9
  yearly increment: 10^9 words in about 10^6 files

The main concern is full-text search within TEI/body which must be performant:
users interact with the database searching full text.

Indexing the aforementioned amount of data should be achievable in
reasonable time, say:
- initial indexing may last some days, if necessary
- incremental(?) indexing of new data should be an overnight job

Can I give BaseX a try? Or should I look elsewhere?

Cheers,
Matthias





Reply via email to