Hi Julie,
For the XMLDOM, its a regular DOM: The *Document Object Model* (*DOM*)
is a platform- and language
<http://en.wikipedia.org/wiki/Programming_language>-independent standard
object model <http://en.wikipedia.org/wiki/Object_model> for
representing HTML <http://en.wikipedia.org/wiki/HTML> or XML
<http://en.wikipedia.org/wiki/XML> and related formats.
I don't think you would want to use that. Its a big data structure. You
would lose performance. We use a DOM because it represent our XML input
in memory.
What you need to do is create your own intermediate format and fill it.
That will be the task performed by your "metro_string2csv" module. Maybe
you should choose a better name for that module.
metro_infdata and metro_infdata_container are used to pass information
between the module. I will give you more information about the whole
data structure. In another mail. I will have to do a little bit of
documentation.
François
Julie Prestopnik wrote:
> Hi François. Thanks for your help. I'm looking into this more now.
>
> Can you please explain what the code in data_module/metro_infdata.py and
> data_module/metro_infdata_container.py does?
>
> Also, in your notes it says that metro_string2dom (forecast, observation
> and station) "creates a XMLDOM from a string", and you indicate that we
> would need to create a metro_string2csv set of files to convert the
> string into an intermediate format. Can you describe in more detail
> what a XMLDOM looks like/contains?
>
> Thanks for your help in advance.
>
> Julie
>
> Francois Fortin wrote:
>
>> Hi Julie,
>> Yes for maximum performance, you should probably have a complete CSV
>> format.
>>
>> To maximize performance, I would suggest to skip the xmlDoc. Instead you
>> will need to create a new Class call metro_csv2metro. That class would
>> replace metro_dom2metro(metro_dom2metro.py). That way, you would skip
>> all xml code.
>>
>> In The metro_config.py file, the execution sequence would have to be
>> changed:
>>
>> dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
>> {'VALUE' :["metro_read_forecast", (read the file and put it in
>> a string)
>> "metro_validate_forecast", (validate the
>> well-formatedness of the XML files) (this is optional)
>> "metro_string2dom_forecast", (create a XMLDOM from a
>> string)
>> "metro_read_observation", (read the file and put it
>> in a string)
>> "metro_validate_observation", (validate the
>> well-formatedness of the XML files) (this is optional)
>> "metro_string2dom_observation", (create a XMLDOM
>> from a string)
>> "metro_read_station", (read the file and put it in a
>> string)
>> "metro_validate_station", (validate the
>> well-formatedness of the XML files) (this is optional)
>> "metro_string2dom_station", (create a XMLDOM from a
>> string)
>> "metro_dom2metro", (convert all XMLDOM to the metro
>> data structure)
>> ...
>>
>> would become:
>>
>> dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
>> {'VALUE' :["metro_read_forecast", (read the file and put it in
>> a string)
>> "metro_validate_forecast", (validate the
>> well-formatedness of the CSV files) (this is optional)
>> "metro_string2csv_forecast", (convert string to
>> intermediate format)
>> "metro_read_observation", (read the file and put it
>> in a string)
>> "metro_validate_observation", (validate the
>> well-formatedness of the CSV files) (this is optional)
>> "metro_string2csv_observation", (convert string to
>> intermediate format)
>> "metro_read_station", (read the file and put it in a
>> string)
>> "metro_validate_station", (validate the
>> well-formatedness of the CSV files) (this is optional)
>> "metro_string2csv_station", (convert string to
>> intermediate format)
>> "metro_csv2metro", (convert intermediate format to
>> the metro data structure)
>> ...
>>
>> I would do it that way. I would also try to reuse the metro_config file
>> as much as possible to read your CSV file. I would, for exemple, order
>> my CSV data for forecast in the order provided by:
>>
>> --------------------------------------------------------------------------------------------------
>>
>> dConfig['XML_FORECAST_PREDICTION_STANDARD_ITEMS'] = \
>> {'VALUE' :[{'NAME':"FORECAST_TIME",
>> 'XML_TAG':"forecast-time",
>> 'DATA_TYPE':"DATE"},
>>
>> {'NAME':"WS",
>> 'XML_TAG':"ws",
>> 'DATA_TYPE':"REAL"},
>>
>> {'NAME':"AP",
>> 'XML_TAG':"ap",
>> 'DATA_TYPE':"REAL"},
>>
>> {'NAME':"AT",
>> 'XML_TAG':"at",
>> 'DATA_TYPE':"REAL"},
>>
>> {'NAME':"TD",
>> 'XML_TAG':"td",
>> 'DATA_TYPE':"REAL"},
>>
>> {'NAME':"CC",
>> 'XML_TAG':"cc",
>> 'DATA_TYPE':"INTEGER"},
>>
>> {'NAME':"SN",
>> 'XML_TAG':"sn",
>> 'DATA_TYPE':"REAL"},
>>
>> {'NAME':"RA",
>> 'XML_TAG':"ra",
>> 'DATA_TYPE':"REAL"},
>> ],
>> 'FROM' :CFG_INTERNAL,
>> 'COMMENTS' :_("standard forecast prediction items")}
>> ----------------------------------------------------------------------------------
>>
>> I would also use the 'DATA_TYPE' to convert string to the right type.
>>
>> I hope its helping you. If you have any question, feel free to contact me.
>> François
>>
>>
>> Julie Prestopnik wrote:
>>
>>> Hi François. Regarding the mixture of XML and CSV, I'm guessing we
>>> probably wouldn't get much of an increase in performance that way, but
>>> I'm not sure. I say that only because if all of the XML code still has
>>> to execute, it seems that it would take a similar amount of time as it
>>> does now. I think the CSV format we had in mind would be similar to
>>> what you have below, except without the XML.
>>>
>>> Digging deeper in the code, it looks like the data is stored in an
>>> xmlDoc object. I'm thinking we would need to create an xmlDoc object
>>> with our CSV data, but I'm not sure about that either. From what you
>>> know of the code, does that seem accurate? Do you have any
>>> input/suggestions about that?
>>>
>>> Thanks for your help,
>>> Julie
>>>
>>> Francois Fortin wrote:
>>>
>>>
>>>> Hi Julie,
>>>> I tough I could easily add a CSV parser to my code. The format would
>>>> have been a mixture of XML and CSV. Here is an exemple:
>>>>
>>>> <?xml version="1.0"?>
>>>> <forecast>
>>>> <header>
>>>> <version>1.1</version>
>>>> <production-date>2004-01-30T12:00Z</production-date>
>>>> </header>
>>>> <prediction-list>
>>>>
>>>> <prediction>2004-01-30T12:00Z,22,1,-11.00,-14.00,0,0.00,0.00,984.00</prediction>
>>>>
>>>>
>>>>
>>>> <prediction>2004-01-30T13:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>>
>>>>
>>>>
>>>> <prediction>2004-01-30T14:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>>
>>>>
>>>>
>>>> <prediction>2004-01-30T15:00Z,20,1,-9.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>>
>>>>
>>>>
>>>> <prediction>2004-01-30T16:00Z,20,1,-8.00,-13.00,7,0.00,0.00,989.00</prediction>
>>>>
>>>>
>>>>
>>>> <prediction>2004-01-30T17:00Z,20,1,-7.00,-12.00,7,0.00,0.00,989.00</prediction>
>>>>
>>>>
>>>> </prediction-list>
>>>> </forecast>
>>>>
>>>> Unfortunately this require a bigger effort then what tough. Also I'm not
>>>> sure you would gain enough performance from that modification. Do you
>>>> think it would be OK?
>>>>
>>>> Do you have an idea what your CSV format would be?
>>>>
>>>>
>>>>
>>>>
>>>> Julie Prestopnik wrote:
>>>>
>>>>
>>>>> Thanks, François! I'll look forward to getting your suggestions.
>>>>>
>>>>> Julie
>>>>>
>>>>> Francois Fortin wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi Julie,
>>>>>>
>>>>>> From now on we will do a reply to [EMAIL PROTECTED] This
>>>>>> way our
>>>>>> discussion will be broadcast to other develloper.
>>>>>>
>>>>>> I would prefer method 1 or 2. If you prefer method 3 its not a problem
>>>>>> but maybe we can add method 2 also to force a particular file type.
>>>>>>
>>>>>> We would like to integrate it in METRo. It would be a great feature to
>>>>>> have.
>>>>>>
>>>>>> I will look at the source code a bit before sending you my
>>>>>> suggestion. I
>>>>>> will try to do that by the end of the week.
>>>>>>
>>>>>> Thanks
>>>>>> François
>>>>>>
>>>>>> Julie Prestopnik wrote:
>>>>>>
>>>>>>
>>>>>>> Hi François. I working on the design right now. My plan was to use
>>>>>>> the
>>>>>>> python debugger to step through the METRo modules from the very
>>>>>>> beginning of program execution to see what modules would need to be
>>>>>>> modified/added.
>>>>>>>
>>>>>>> My co-workers and I discussed the various options for letting METRo
>>>>>>> know
>>>>>>> that it is being given a new type of input/output file and we came up
>>>>>>> with three possible ways:
>>>>>>>
>>>>>>> 1. Changing the command line options. For example, having
>>>>>>> "--input-forecast-csv filename" instead of "--input-forecast
>>>>>>> filename"
>>>>>>> (Note the addition of the -csv at the end of the option)
>>>>>>>
>>>>>>> 2. Adding another option to METRo. For example, "--input-file-format
>>>>>>> filetype" and "--output-file-format filetype", with filetype being
>>>>>>> either "xml" or "csv".
>>>>>>>
>>>>>>> 3. Pushing the check further down. For example, once the
>>>>>>> filenames are
>>>>>>> loaded in, check for a .xml or .csv extension and proceed as
>>>>>>> necessary.
>>>>>>>
>>>>>>> I don't think we really have a preference on which one to use.
>>>>>>> Personally, I'm somewhat fond of the third option, because the
>>>>>>> interface
>>>>>>> to METRo would not change, however, then it restricts the user in
>>>>>>> their
>>>>>>> choice of input and output filenames. Do you have a preference?
>>>>>>>
>>>>>>> Regarding your question of knowing exactly what to do to allow for
>>>>>>> the
>>>>>>> different input format, like I said, I've started using the
>>>>>>> debugger to
>>>>>>> step through the code and have been making notes along the way of
>>>>>>> what
>>>>>>> code appears to need modification. Then, I was going to dive in,
>>>>>>> start
>>>>>>> making changes, and hope for the best. ;)
>>>>>>>
>>>>>>> Any suggestions from you would certainly be appreciated. Is this
>>>>>>> something you would consider integrating into and maintaining in
>>>>>>> MetSurface if I get it working?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Julie
>>>>>>>
>>>>>>> Francois Fortin wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi Julie,
>>>>>>>> Do you have an idea how to do that? I have one but I will need some
>>>>>>>> time
>>>>>>>> to look at that.
>>>>>>>>
>>>>>>>> Julie Prestopnik wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hello METRo developers.
>>>>>>>>>
>>>>>>>>> We (NCAR) are considering adding an option to allow for a new input
>>>>>>>>> format (CSV) to METRo, since the XML I/O consumes most of the
>>>>>>>>> processing
>>>>>>>>> time. We'd like to add an option, either by adding to or changing
>>>>>>>>> the
>>>>>>>>> command line interface or by pushing the check further down (e.g.
>>>>>>>>> checking for file extension: .xml or .csv).
>>>>>>>>>
>>>>>>>>> We wanted to run this by everyone to see if you might have any
>>>>>>>>> input,
>>>>>>>>> suggestions, or objections.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Julie
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> METRo-developers mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://mail.gna.org/listinfo/metro-developers
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
--
François Fortin
Programmeur analyste scientifique
(514)421-7245
_______________________________________________
METRo-developers mailing list
[email protected]
https://mail.gna.org/listinfo/metro-developers