Hi Julie,
Yes for maximum performance, you should probably have a complete CSV format.
To maximize performance, I would suggest to skip the xmlDoc. Instead you
will need to create a new Class call metro_csv2metro. That class would
replace metro_dom2metro(metro_dom2metro.py). That way, you would skip
all xml code.
In The metro_config.py file, the execution sequence would have to be
changed:
dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
{'VALUE' :["metro_read_forecast", (read the file and put it in
a string)
"metro_validate_forecast", (validate the
well-formatedness of the XML files) (this is optional)
"metro_string2dom_forecast", (create a XMLDOM from
a string)
"metro_read_observation", (read the file and put it
in a string)
"metro_validate_observation", (validate the
well-formatedness of the XML files) (this is optional)
"metro_string2dom_observation", (create a XMLDOM
from a string)
"metro_read_station", (read the file and put it in
a string)
"metro_validate_station", (validate the
well-formatedness of the XML files) (this is optional)
"metro_string2dom_station", (create a XMLDOM from a
string)
"metro_dom2metro", (convert all XMLDOM to the metro
data structure)
...
would become:
dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
{'VALUE' :["metro_read_forecast", (read the file and put it in
a string)
"metro_validate_forecast", (validate the
well-formatedness of the CSV files) (this is optional)
"metro_string2csv_forecast", (convert string to
intermediate format)
"metro_read_observation", (read the file and put it
in a string)
"metro_validate_observation", (validate the
well-formatedness of the CSV files) (this is optional)
"metro_string2csv_observation", (convert string to
intermediate format)
"metro_read_station", (read the file and put it in
a string)
"metro_validate_station", (validate the
well-formatedness of the CSV files) (this is optional)
"metro_string2csv_station", (convert string to
intermediate format)
"metro_csv2metro", (convert intermediate format to
the metro data structure)
...
I would do it that way. I would also try to reuse the metro_config file
as much as possible to read your CSV file. I would, for exemple, order
my CSV data for forecast in the order provided by:
--------------------------------------------------------------------------------------------------
dConfig['XML_FORECAST_PREDICTION_STANDARD_ITEMS'] = \
{'VALUE' :[{'NAME':"FORECAST_TIME",
'XML_TAG':"forecast-time",
'DATA_TYPE':"DATE"},
{'NAME':"WS",
'XML_TAG':"ws",
'DATA_TYPE':"REAL"},
{'NAME':"AP",
'XML_TAG':"ap",
'DATA_TYPE':"REAL"},
{'NAME':"AT",
'XML_TAG':"at",
'DATA_TYPE':"REAL"},
{'NAME':"TD",
'XML_TAG':"td",
'DATA_TYPE':"REAL"},
{'NAME':"CC",
'XML_TAG':"cc",
'DATA_TYPE':"INTEGER"},
{'NAME':"SN",
'XML_TAG':"sn",
'DATA_TYPE':"REAL"},
{'NAME':"RA",
'XML_TAG':"ra",
'DATA_TYPE':"REAL"},
],
'FROM' :CFG_INTERNAL,
'COMMENTS' :_("standard forecast prediction items")}
----------------------------------------------------------------------------------
I would also use the 'DATA_TYPE' to convert string to the right type.
I hope its helping you. If you have any question, feel free to contact me.
François
Julie Prestopnik wrote:
> Hi François. Regarding the mixture of XML and CSV, I'm guessing we
> probably wouldn't get much of an increase in performance that way, but
> I'm not sure. I say that only because if all of the XML code still has
> to execute, it seems that it would take a similar amount of time as it
> does now. I think the CSV format we had in mind would be similar to
> what you have below, except without the XML.
>
> Digging deeper in the code, it looks like the data is stored in an
> xmlDoc object. I'm thinking we would need to create an xmlDoc object
> with our CSV data, but I'm not sure about that either. From what you
> know of the code, does that seem accurate? Do you have any
> input/suggestions about that?
>
> Thanks for your help,
> Julie
>
> Francois Fortin wrote:
>
>> Hi Julie,
>> I tough I could easily add a CSV parser to my code. The format would
>> have been a mixture of XML and CSV. Here is an exemple:
>>
>> <?xml version="1.0"?>
>> <forecast>
>> <header>
>> <version>1.1</version>
>> <production-date>2004-01-30T12:00Z</production-date>
>> </header>
>> <prediction-list>
>>
>> <prediction>2004-01-30T12:00Z,22,1,-11.00,-14.00,0,0.00,0.00,984.00</prediction>
>>
>>
>> <prediction>2004-01-30T13:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>
>>
>> <prediction>2004-01-30T14:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>
>>
>> <prediction>2004-01-30T15:00Z,20,1,-9.00,-13.00,7,0.00,0.00,988.00</prediction>
>>
>>
>> <prediction>2004-01-30T16:00Z,20,1,-8.00,-13.00,7,0.00,0.00,989.00</prediction>
>>
>>
>> <prediction>2004-01-30T17:00Z,20,1,-7.00,-12.00,7,0.00,0.00,989.00</prediction>
>>
>> </prediction-list>
>> </forecast>
>>
>> Unfortunately this require a bigger effort then what tough. Also I'm not
>> sure you would gain enough performance from that modification. Do you
>> think it would be OK?
>>
>> Do you have an idea what your CSV format would be?
>>
>>
>>
>>
>> Julie Prestopnik wrote:
>>
>>> Thanks, François! I'll look forward to getting your suggestions.
>>>
>>> Julie
>>>
>>> Francois Fortin wrote:
>>>
>>>
>>>> Hi Julie,
>>>>
>>>> From now on we will do a reply to [EMAIL PROTECTED] This way our
>>>> discussion will be broadcast to other develloper.
>>>>
>>>> I would prefer method 1 or 2. If you prefer method 3 its not a problem
>>>> but maybe we can add method 2 also to force a particular file type.
>>>>
>>>> We would like to integrate it in METRo. It would be a great feature to
>>>> have.
>>>>
>>>> I will look at the source code a bit before sending you my suggestion. I
>>>> will try to do that by the end of the week.
>>>>
>>>> Thanks
>>>> François
>>>>
>>>> Julie Prestopnik wrote:
>>>>
>>>>
>>>>> Hi François. I working on the design right now. My plan was to use
>>>>> the
>>>>> python debugger to step through the METRo modules from the very
>>>>> beginning of program execution to see what modules would need to be
>>>>> modified/added.
>>>>>
>>>>> My co-workers and I discussed the various options for letting METRo
>>>>> know
>>>>> that it is being given a new type of input/output file and we came up
>>>>> with three possible ways:
>>>>>
>>>>> 1. Changing the command line options. For example, having
>>>>> "--input-forecast-csv filename" instead of "--input-forecast filename"
>>>>> (Note the addition of the -csv at the end of the option)
>>>>>
>>>>> 2. Adding another option to METRo. For example, "--input-file-format
>>>>> filetype" and "--output-file-format filetype", with filetype being
>>>>> either "xml" or "csv".
>>>>>
>>>>> 3. Pushing the check further down. For example, once the filenames are
>>>>> loaded in, check for a .xml or .csv extension and proceed as necessary.
>>>>>
>>>>> I don't think we really have a preference on which one to use.
>>>>> Personally, I'm somewhat fond of the third option, because the
>>>>> interface
>>>>> to METRo would not change, however, then it restricts the user in their
>>>>> choice of input and output filenames. Do you have a preference?
>>>>>
>>>>> Regarding your question of knowing exactly what to do to allow for the
>>>>> different input format, like I said, I've started using the debugger to
>>>>> step through the code and have been making notes along the way of what
>>>>> code appears to need modification. Then, I was going to dive in, start
>>>>> making changes, and hope for the best. ;)
>>>>>
>>>>> Any suggestions from you would certainly be appreciated. Is this
>>>>> something you would consider integrating into and maintaining in
>>>>> MetSurface if I get it working?
>>>>>
>>>>> Thanks,
>>>>> Julie
>>>>>
>>>>> Francois Fortin wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi Julie,
>>>>>> Do you have an idea how to do that? I have one but I will need some
>>>>>> time
>>>>>> to look at that.
>>>>>>
>>>>>> Julie Prestopnik wrote:
>>>>>>
>>>>>>
>>>>>>> Hello METRo developers.
>>>>>>>
>>>>>>> We (NCAR) are considering adding an option to allow for a new input
>>>>>>> format (CSV) to METRo, since the XML I/O consumes most of the
>>>>>>> processing
>>>>>>> time. We'd like to add an option, either by adding to or changing
>>>>>>> the
>>>>>>> command line interface or by pushing the check further down (e.g.
>>>>>>> checking for file extension: .xml or .csv).
>>>>>>>
>>>>>>> We wanted to run this by everyone to see if you might have any input,
>>>>>>> suggestions, or objections.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Julie
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> METRo-developers mailing list
>>>>>>> [email protected]
>>>>>>> https://mail.gna.org/listinfo/metro-developers
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
--
François Fortin
Programmeur analyste scientifique
(514)421-7245
_______________________________________________
METRo-developers mailing list
[email protected]
https://mail.gna.org/listinfo/metro-developers