Hi François.  Thanks for your help.  I'm looking into this more now.

Can you please explain what the code in data_module/metro_infdata.py and
data_module/metro_infdata_container.py does?

Also, in your notes it says that metro_string2dom (forecast, observation
and station) "creates a XMLDOM from a string", and you indicate that we
would need to create a metro_string2csv set of files to convert the
string into an intermediate format.  Can you describe in more detail
what a XMLDOM looks like/contains?

Thanks for your help in advance.

Julie

Francois Fortin wrote:
> Hi Julie,
> Yes for maximum performance, you should probably have a complete CSV
> format.
> 
> To maximize performance, I would suggest to skip the xmlDoc. Instead you
> will need to create a new Class call metro_csv2metro. That class would
> replace metro_dom2metro(metro_dom2metro.py). That way, you would skip
> all xml code.
> 
> In The metro_config.py file, the execution sequence would have to be
> changed:
> 
>    dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
>        {'VALUE'   :["metro_read_forecast", (read the file and put it in
> a string)
>                     "metro_validate_forecast", (validate the
> well-formatedness of the XML files) (this is optional)
>                     "metro_string2dom_forecast", (create a XMLDOM from a
> string)
>                     "metro_read_observation", (read the file and put it
> in a string)
>                     "metro_validate_observation", (validate the
> well-formatedness of the XML files) (this is optional)
>                     "metro_string2dom_observation", (create a XMLDOM
> from a string)
>                     "metro_read_station", (read the file and put it in a
> string)
>                     "metro_validate_station", (validate the
> well-formatedness of the XML files) (this is optional)
>                     "metro_string2dom_station", (create a XMLDOM from a
> string)
>                     "metro_dom2metro", (convert all XMLDOM to the metro
> data structure)
>                      ...
> 
> would become:
> 
>    dConfig['INIT_MODULE_EXECUTION_SEQUENCE'] = \
>        {'VALUE'   :["metro_read_forecast", (read the file and put it in
> a string)
>                     "metro_validate_forecast", (validate the
> well-formatedness of the CSV files) (this is optional)
>                     "metro_string2csv_forecast", (convert string to
> intermediate format)
>                     "metro_read_observation", (read the file and put it
> in a string)
>                     "metro_validate_observation", (validate the
> well-formatedness of the CSV files) (this is optional)
>                     "metro_string2csv_observation", (convert string to
> intermediate format)
>                     "metro_read_station", (read the file and put it in a
> string)
>                     "metro_validate_station", (validate the
> well-formatedness of the CSV files) (this is optional)
>                     "metro_string2csv_station", (convert string to
> intermediate format)
>                     "metro_csv2metro", (convert intermediate format to
> the metro data structure)
>                     ...
> 
> I would do it that way. I would also try to reuse the metro_config file
> as much as possible to read your CSV file. I would, for exemple, order
> my CSV data for forecast in the order provided by:
> 
> --------------------------------------------------------------------------------------------------
> 
>    dConfig['XML_FORECAST_PREDICTION_STANDARD_ITEMS'] = \
>        {'VALUE' :[{'NAME':"FORECAST_TIME",
>                    'XML_TAG':"forecast-time",
>                    'DATA_TYPE':"DATE"},
> 
>                   {'NAME':"WS",
>                    'XML_TAG':"ws",
>                    'DATA_TYPE':"REAL"},
> 
>                   {'NAME':"AP",
>                    'XML_TAG':"ap",
>                    'DATA_TYPE':"REAL"},
> 
>                   {'NAME':"AT",
>                    'XML_TAG':"at",
>                    'DATA_TYPE':"REAL"},
> 
>                   {'NAME':"TD",
>                    'XML_TAG':"td",
>                    'DATA_TYPE':"REAL"},
> 
>                   {'NAME':"CC",
>                    'XML_TAG':"cc",
>                    'DATA_TYPE':"INTEGER"},
> 
>                   {'NAME':"SN",
>                    'XML_TAG':"sn",
>                    'DATA_TYPE':"REAL"},
> 
>                   {'NAME':"RA",
>                    'XML_TAG':"ra",
>                    'DATA_TYPE':"REAL"},
>                   ],
>         'FROM'     :CFG_INTERNAL,
>         'COMMENTS' :_("standard forecast prediction items")}
> ----------------------------------------------------------------------------------
> 
> I would also use the 'DATA_TYPE' to convert string to the right type.
> 
> I hope its helping you. If you have any question, feel free to contact me.
> François
> 
> 
> Julie Prestopnik wrote:
>> Hi François.  Regarding the mixture of XML and CSV, I'm guessing we
>> probably wouldn't get much of an increase in performance that way, but
>> I'm not sure.  I say that only because if all of the XML code still has
>> to execute, it seems that it would take a similar amount of time as it
>> does now.  I think the CSV format we had in mind would be similar to
>> what you have below, except without the XML.
>>
>> Digging deeper in the code, it looks like the data is stored in an
>> xmlDoc object.  I'm thinking we would need to create an xmlDoc object
>> with our CSV data, but I'm not sure about that either.  From what you
>> know of the code, does that seem accurate?  Do you have any
>> input/suggestions about that?
>>
>> Thanks for your help,
>> Julie
>>
>> Francois Fortin wrote:
>>  
>>> Hi Julie,
>>> I tough I could easily add a CSV parser to my code. The format would
>>> have been a mixture of XML and CSV. Here is an exemple:
>>>
>>> <?xml version="1.0"?>
>>> <forecast>
>>>    <header>
>>>        <version>1.1</version>
>>>        <production-date>2004-01-30T12:00Z</production-date>
>>>    </header>
>>>    <prediction-list>
>>>      
>>> <prediction>2004-01-30T12:00Z,22,1,-11.00,-14.00,0,0.00,0.00,984.00</prediction>
>>>
>>>
>>>      
>>> <prediction>2004-01-30T13:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>
>>>
>>>      
>>> <prediction>2004-01-30T14:00Z,20,1,-10.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>
>>>
>>>      
>>> <prediction>2004-01-30T15:00Z,20,1,-9.00,-13.00,7,0.00,0.00,988.00</prediction>
>>>
>>>
>>>      
>>> <prediction>2004-01-30T16:00Z,20,1,-8.00,-13.00,7,0.00,0.00,989.00</prediction>
>>>
>>>
>>>      
>>> <prediction>2004-01-30T17:00Z,20,1,-7.00,-12.00,7,0.00,0.00,989.00</prediction>
>>>
>>>
>>>    </prediction-list>
>>> </forecast>
>>>
>>> Unfortunately this require a bigger effort then what tough. Also I'm not
>>> sure you would gain enough performance from that modification. Do you
>>> think it would be OK?
>>>
>>> Do you have an idea what your CSV format would be?
>>>
>>>
>>>
>>>
>>> Julie Prestopnik wrote:
>>>    
>>>> Thanks, François!  I'll look forward to getting your suggestions.
>>>>
>>>> Julie
>>>>
>>>> Francois Fortin wrote:
>>>>  
>>>>      
>>>>> Hi Julie,
>>>>>
>>>>> From now on we will do a reply to [EMAIL PROTECTED] This
>>>>> way our
>>>>> discussion will be broadcast to other develloper.
>>>>>
>>>>> I would prefer method 1 or 2. If you prefer method 3 its not a problem
>>>>> but maybe we can add method 2 also to force a particular file type.
>>>>>
>>>>> We would like to integrate it in METRo. It would be a great feature to
>>>>> have.
>>>>>
>>>>> I will look at the source code a bit before sending you my
>>>>> suggestion. I
>>>>> will try to do that by the end of the week.
>>>>>
>>>>> Thanks
>>>>> François
>>>>>
>>>>> Julie Prestopnik wrote:
>>>>>           
>>>>>> Hi François.  I working on the design right now.  My plan was to use
>>>>>> the
>>>>>> python debugger to step through the METRo modules from the very
>>>>>> beginning of program execution to see what modules would need to be
>>>>>> modified/added.
>>>>>>
>>>>>> My co-workers and I discussed the various options for letting METRo
>>>>>> know
>>>>>> that it is being given a new type of input/output file and we came up
>>>>>> with three possible ways:
>>>>>>
>>>>>> 1. Changing the command line options.  For example, having
>>>>>> "--input-forecast-csv filename" instead of "--input-forecast
>>>>>> filename"
>>>>>> (Note the addition of the -csv at the end of the option)
>>>>>>
>>>>>> 2. Adding another option to METRo.  For example, "--input-file-format
>>>>>> filetype" and "--output-file-format filetype", with filetype being
>>>>>> either "xml" or "csv".
>>>>>>
>>>>>> 3. Pushing the check further down.  For example, once the
>>>>>> filenames are
>>>>>> loaded in, check for a .xml or .csv extension and proceed as
>>>>>> necessary.
>>>>>>
>>>>>> I don't think we really have a preference on which one to use.
>>>>>> Personally, I'm somewhat fond of the third option, because the
>>>>>> interface
>>>>>> to METRo would not change, however, then it restricts the user in
>>>>>> their
>>>>>> choice of input and output filenames.  Do you have a preference?
>>>>>>
>>>>>> Regarding your question of knowing exactly what to do to allow for
>>>>>> the
>>>>>> different input format, like I said, I've started using the
>>>>>> debugger to
>>>>>> step through the code and have been making notes along the way of
>>>>>> what
>>>>>> code appears to need modification.  Then, I was going to dive in,
>>>>>> start
>>>>>> making changes, and hope for the best.  ;)
>>>>>>
>>>>>> Any suggestions from you would certainly be appreciated.  Is this
>>>>>> something you would consider integrating into and maintaining in
>>>>>> MetSurface if I get it working?
>>>>>>
>>>>>> Thanks,
>>>>>> Julie
>>>>>>
>>>>>> Francois Fortin wrote:
>>>>>>  
>>>>>>               
>>>>>>> Hi Julie,
>>>>>>> Do you have an idea how to do that? I have one but I will need some
>>>>>>> time
>>>>>>> to look at that.
>>>>>>>
>>>>>>> Julie Prestopnik wrote:
>>>>>>>                      
>>>>>>>> Hello METRo developers.
>>>>>>>>
>>>>>>>> We (NCAR) are considering adding an option to allow for a new input
>>>>>>>> format (CSV) to METRo, since the XML I/O consumes most of the
>>>>>>>> processing
>>>>>>>> time.  We'd like to add an option, either by adding to or changing
>>>>>>>> the
>>>>>>>> command line interface or by pushing the check further down (e.g.
>>>>>>>> checking for file extension: .xml or .csv).
>>>>>>>>
>>>>>>>> We wanted to run this by everyone to see if you might have any
>>>>>>>> input,
>>>>>>>> suggestions, or objections.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Julie
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> METRo-developers mailing list
>>>>>>>> [email protected]
>>>>>>>> https://mail.gna.org/listinfo/metro-developers
>>>>>>>>                                 
>>>>>>                   
>>>>         
>>
>>   
> 


_______________________________________________
METRo-developers mailing list
[email protected]
https://mail.gna.org/listinfo/metro-developers

Reply via email to