XML parsing (loads of data question)...

2000-09-21 Thread David Bovill

If I have large amounts of XML data (say 30-100MB), which I need to parse.
The first thought was to load it into an array, at start-up and then get the
keys, and loop through each key (using repeat for each line) - testing to
see if there is a match and returning the value if there is.

However I figure that this would nearly double the amount of memory that I
would need and the full index of keys would be almost as large as the entire
index. As there is no syntax for referring to elements in an associative
array by numerical index (ie get the first, second etc), what would be the
fastest, and most memory efficient technique? Should I use an external
database like "Pandora" or whatever the name beginning with P I am searching
for is? maybe i need my own relational database -:)


Archives: http://www.mail-archive.com/metacard%40lists.best.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to [EMAIL PROTECTED], not this list.




Re: XML parsing (loads of data question)...

2000-09-21 Thread Gary Rathbone

I'm looking towards Valentina (http://www.paradigmasoft.com) for a
relational database solution for MC. However you need to get the data from
the format its in, to the format suitable for Valentina import.

A solution I've use in the past is to index the records into a file
structure depending on the key, or keys.

A converter would be written to split the data into directories and
subdirectories so that smaller files exist in defined locations.

eg to search on a surname "Davis" MC could look into C:/d/a/v/info.dat
which should contain a relatively small file with names such as Davis,
Davies, Davison, Davidson etc. and therefore much easier and quicker to
handle.

Other issues writing to this file structure 'database' and/or how often you
need to convert the original/updated data.

Gary Rathbone


on 9/21/00 8:48 AM, David Bovill at [EMAIL PROTECTED] wrote:

 If I have large amounts of XML data (say 30-100MB), which I need to parse.
 The first thought was to load it into an array, at start-up and then get the
 keys, and loop through each key (using repeat for each line) - testing to
 see if there is a match and returning the value if there is.
 
 However I figure that this would nearly double the amount of memory that I
 would need and the full index of keys would be almost as large as the entire
 index. As there is no syntax for referring to elements in an associative
 array by numerical index (ie get the first, second etc), what would be the
 fastest, and most memory efficient technique? Should I use an external
 database like "Pandora" or whatever the name beginning with P I am searching
 for is? maybe i need my own relational database -:)
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.
 
 


Archives: http://www.mail-archive.com/metacard%40lists.best.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to [EMAIL PROTECTED], not this list.




Re: XML parsing (loads of data question)...

2000-09-21 Thread David Bovill

Thanks Gary...

 From: Gary Rathbone [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 Date: Thu, 21 Sep 2000 14:23:55 +0100
 To: [EMAIL PROTECTED]
 Subject: Re: XML parsing (loads of data question)...
 
 I'm looking towards Valentina (http://www.paradigmasoft.com) for a
 relational database solution for MC.
 

How implications does working with Valentina have for memory issues. I ask
because one of the reasons for converting the project from Java is to get it
working on Macs with under 50MB of RAM.

 However you need to get the data from
 the format its in, to the format suitable for Valentina import.


What sort of format is required? Tab delimited?


 A solution I've use in the past is to index the records into a file
 structure depending on the key, or keys.
 
 A converter would be written to split the data into directories and
 subdirectories so that smaller files exist in defined locations.
 
 eg to search on a surname "Davis" MC could look into C:/d/a/v/info.dat
 which should contain a relatively small file with names such as Davis,
 Davies, Davison, Davidson etc. and therefore much easier and quicker to
 handle.
 
 Other issues writing to this file structure 'database' and/or how often you
 need to convert the original/updated data.
 

Thanks for the tip.

 
 If I have large amounts of XML data (say 30-100MB), which I need to parse.
 The first thought was to load it into an array, at start-up and then get the
 keys, and loop through each key (using repeat for each line) - testing to
 see if there is a match and returning the value if there is.
 
 However I figure that this would nearly double the amount of memory that I
 would need and the full index of keys would be almost as large as the entire
 index. As there is no syntax for referring to elements in an associative
 array by numerical index (ie get the first, second etc), what would be the
 fastest, and most memory efficient technique? Should I use an external
 database like "Pandora" or whatever the name beginning with P I am searching
 for is? maybe i need my own relational database -:)
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.
 
 
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.


Archives: http://www.mail-archive.com/metacard%40lists.best.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to [EMAIL PROTECTED], not this list.




Re: XML parsing (loads of data question)...

2000-09-21 Thread Gary Rathbone

Can't help too much on the Valentina specifics as I'm evaluating it myself.
Suggest you take a look at the site http://www.paradigmasoft.com.
and ask the guys there (they've been very helpful). I know Scott endorses it
...

Post dated 3/9/2000
--snip-- Anyone building (or contemplating building) applications that
require managing more than a few MB of data or that require sophisticated
query support should download [valentina] --snip--

 --snip-- This is a key technology for MetaCard developers to have in their
arsenal and we need to do what it takes to make sure that it works well.
Regards, Scott [Raney] --snip--

on 21/9/00 15:20, David Bovill at [EMAIL PROTECTED] wrote:

 Thanks Gary...
 
 From: Gary Rathbone [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 Date: Thu, 21 Sep 2000 14:23:55 +0100
 To: [EMAIL PROTECTED]
 Subject: Re: XML parsing (loads of data question)...
 
 I'm looking towards Valentina (http://www.paradigmasoft.com) for a
 relational database solution for MC.
 
 
 How implications does working with Valentina have for memory issues. I ask
 because one of the reasons for converting the project from Java is to get it
 working on Macs with under 50MB of RAM.
 
 However you need to get the data from
 the format its in, to the format suitable for Valentina import.
 
 
 What sort of format is required? Tab delimited?
 
 
 A solution I've use in the past is to index the records into a file
 structure depending on the key, or keys.
 
 A converter would be written to split the data into directories and
 subdirectories so that smaller files exist in defined locations.
 
 eg to search on a surname "Davis" MC could look into C:/d/a/v/info.dat
 which should contain a relatively small file with names such as Davis,
 Davies, Davison, Davidson etc. and therefore much easier and quicker to
 handle.
 
 Other issues writing to this file structure 'database' and/or how often you
 need to convert the original/updated data.
 
 
 Thanks for the tip.
 
 
 If I have large amounts of XML data (say 30-100MB), which I need to parse.
 The first thought was to load it into an array, at start-up and then get the
 keys, and loop through each key (using repeat for each line) - testing to
 see if there is a match and returning the value if there is.
 
 However I figure that this would nearly double the amount of memory that I
 would need and the full index of keys would be almost as large as the entire
 index. As there is no syntax for referring to elements in an associative
 array by numerical index (ie get the first, second etc), what would be the
 fastest, and most memory efficient technique? Should I use an external
 database like "Pandora" or whatever the name beginning with P I am searching
 for is? maybe i need my own relational database -:)
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.
 
 
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.
 
 
 Archives: http://www.mail-archive.com/metacard%40lists.best.com/
 Info: http://www.xworlds.com/metacard/mailinglist.htm
 Please send bug reports to [EMAIL PROTECTED], not this list.
 
 


Archives: http://www.mail-archive.com/metacard%40lists.best.com/
Info: http://www.xworlds.com/metacard/mailinglist.htm
Please send bug reports to [EMAIL PROTECTED], not this list.