I find myself in a situation where I need to build a tool to analyse
lots of xml data. Thousands of records containing a lot of strings as
well as numericals.

   When I found myself in this situation I did 2 things:

1. Don't use XML, it is way too heavy for this much data. I found that by using a double-delimeted or fixed-width data format, the file size was reduced by as much as 70%. In the end, I went with fixed width because I could parse it faster (by avoiding calling split() thousands of times).

Now, I still used the XML object, but instead of letting it parse the file, I overwrote the onData event and used my own parsing function, which generated objects directly instead of parsing it out to an XML object. Essentially, the XML object just read the data in and dumped it to my parsing function.

2. Don't try to parse it all at once. What I did was dump it all into a buffer when it was loaded, and then fire off a parsing function that parsed 250 records per frame. I found that number through trial and error, you can find your own balance. The important thing was, the application didn't stop functioning while the records were being parsed, you could go to other areas of the app and use it normally, and when you went to the section that required the data, you got a progress bar showing how many records had been parsed.

My parsing function was semi-complicated. It took the whole dataset in as a string and split it on my record delimiter, and this array became my buffer. This way I knew how many records there were to parse, and approximately how long it would take to parse them. It then sliced 250 records off the top of the buffer on every frame and passed them to the serialization function, which took them, serialized them, and inserted them into my "database" object. My parsing function also built several indexes while it was parsing the records, to make lookups faster once the database was ready. My application was a database of hotels, which were sortable by a number of criteria, so the parsing routine looked for those attributes of each hotel as it parsed, and when it saw a new value for one of those criteria, it made a new entry in the appropriate index for it.

   I made very heavy use of the object collection syntax, for example:

Index["Location"]["USA"]["Texas"]["Dallas"]

...referred to an array of hotel ids which were in Dallas, Texas, USA, which could be used to find a hotel like this:

// 0 is the first index in the array of ids
hotelID = Index["Location"]["USA"]["Texas"]["Dallas"][0];
return(Database[hotelID]);

In the end, it took about 5 times as much code to import, parse, and index the database than the whole rest of the application, but it worked, it was relatively fast, and it met the requirements I was given. I would've preferred for it to work from a web server, selecting what I needed from the database, but the client required that it work offline from a database that shipped with the cd, as well as be able to download an updated database from their website, and this was the best solution I could find in Flash that worked on both PC and Mac (no 3rd party wrappers). Unfortunately it had to parse the whole database every time you ran the app, but it would get the newest version from the web if you were online and it gave you the option to store it (in an ungodly-sized shared object) if you wanted to.

Anyway, that's how I did it, whether or not it was successful is a matter of opinion. ;-)

ryanm
_______________________________________________
Flashcoders mailing list
[email protected]
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

Reply via email to