Hi Greg,

So, to be clear and succinct, the goal is to create a single XML file
containing all XML files that have a predefined text string match in them,
yes?

If so, I'm wondering if creating any database is necessary. A single pass
through all the files, searching for the text string, and appending matched
files as you go seems sufficient.

R

On Mon, Aug 5, 2019, 08:42 Greg Kawchuk <greg.kawc...@ualberta.ca> wrote:

> Hi everyone,
> I'm wondering if someone could provide what I think is a brief script for
> a scientific project to do the following.
> The goal is to identify XML documents from a very large collection that
> would be too big to load into a database all at once.
>
> Here is how I see the functions provided by the code.
> 1. In the script, the user could enter the path of the target folder (with
> millions of XML documents).
> 2. In the script, the user would enter the number of documents to load
> into a database at a given time (i =. 1,000) depending on memory
> limitations.
> 3. The code would then create a temporary database from the first (i) xml
> files in the target folder.
> 4. The code would then search the 1000 xml documents in the database for a
> pre-defined text string.
> 5. If hits exist for the text string, the code would write those documents
> to a unique XML file.
> 6. Clear the database.
> 7. Read in the next 1000 files (or remaining files in the folder).
> 8. Return to #4.
>
> There would be no need to append XML files in step 5. The resulting XML
> files could be concatenated afterwards.
> Thank you in advance. If you have any questions, please feel free to email
> me here.
> Greg
>
> ***************************************************
> Greg Kawchuk BSC, DC, MSc, PhD.
> Professor, Faculty of Rehabilitation Medicine
> University of Alberta
> greg.kawc...@ualberta.ca
> 780-492-6891
>

Reply via email to