Hi Andrew

Thanks for your question about ASCEND and the scalability of its data reader 
(DR) functionality. I wrote the original code there, and it was later expanded 
by José Zapata to add support for improved spline methods and additional data 
formats. The documentation, such as it is, is mostly in our wiki, here: 
https://ascend4.org/Data_reader.

You are proposing a really nice idea which is to share the in-memory instances 
of the data tables that the DR uses, to save memory. I think it's a very good 
idea.

As noted in the documentation, the DR uses the 'external relation' interface in 
ASCEND. This means that the datareader.c code has to work within the 
constraints of that API 
(https://ascend4.org/Writing_ASCEND_external_relations_in_C). That should be 
fine though; we get to implement a 'prepare' and a 'calculate' function for 
each external relation, and hence for each DR instance. When writing a DR 
extrel, the user gives details on the file format they want to use, the 
specific data file they want to load, as well as the columns they want to 
interpolate, using a 'data instance' such as that 'drconf' model mentioned in 
the Data Reader wiki page. 'datareader.c' handles the interfacing of the Data 
Reader with the user's ASCEND code.

The actual loading of the files is coordinated through dr.c (and dr.h). The 
file format (datareader_set_format) is used to set up links to the relevant 
file-format-specific routines that load the header and data rows from the data 
file in an abstracted way.

So by the time we're talking about a specific kind of datafile, the key stuff 
is happening eg in csv.c 
(http://code.ascend4.org/ascend/trunk/models/johnpye/datareader/csv.c?revision=3291&view=markup).
 csv.c loads the raw data from a CSV file and stores everything in the 'data' 
field of the DataReader struct. That is the key part where some re-use might be 
possible. It looks to me that the CSV file loads all of the columns from the 
file, even if only a part of it is actually requested by the user. That's 
actually nice -- it means we can reuse the resulting 'data' pointer across 
multiple DR instances.

If you were keen to implement an improvement here, I would say that a good 
option would be to modify the dr.c code so that when a request comes in to load 
a file of a given format, you first check a list of previously-loaded files and 
formats (eg g_datareader_files?). If the file has already been loaded and with 
the same format specifier, copy the 'data' pointer into the current DataReader 
struct, increment a reference, and go from there. If the file has not been 
loaded, load it and add it to the list. Then there would need to be some 
destructor code need, perhaps in ascend/compiler/simlist.c.

Another point on the scalability of the DR is that it uses linear search to 
position within the file. The idea there was that our typical use-case was 
time-stepping through a weather file, eg hourly throughout a year. However, in 
other uses, it might be desirable to be able to jump around more efficiently. 
In that case, there could be a binary search tree structure implemented for 
faster access to the data. Depending on whether you are using large data files, 
that might or might not be important.

If, on the other hand, you are looking for a different approach to this, you 
might like to look at the FPROPS code in ASCEND. This specifically implements 
fluid property data such as thermal conductivity, and reuses the data 
structures automatically. In that case, you could implement your own fluid data 
type (some recent work I did with ASCEND was in the 'fprops-incomp' branch, for 
modelling properties of incompressible fluids) 
http://code.ascend4.org/ascend/branches/fprops-incomp/models/johnpye/fprops/. 
You could look into that and see if it gives you an easier way forward. It has 
the advantage of making your same evaluation routines also accessible from 
Python, if that's of interest.

Hope this helps!

Cheers
JP

On 22/9/20 7:18 am, Andrew Stubblefield via Ascend-sim-users wrote:
Hello,

    I am fairly new to ASCEND, and I have just discovered the Data Reader 
external library function, which seems a good fit for my application.  I would 
like to use Data Reader to interpolate tabulated data from a CSV file with two 
columns: temperature [K] and thermal conductivity integral [W/m].  I have 
achieved the desired results so far by creating a separate data reader object 
for each temperature variable.  For instance, if I divide my part of interest 
into four segments, then I declare four temperature variables and create four 
data readers to access the same CSV file to return the thermal conductivity for 
each of the four segments.  Upon loading the model, I get four solver messages 
that basically say "Created data reader at Memory Location" and "Read ### 
Lines".  So it appears that each data reader I create reads in all the 
information from the CSV file.
    Now I would like to discretize my part of interest into many more segments 
(hundreds possibly).  Using my current method, I will need to create hundreds 
of data readers that all read in the same exact information.  Is it possible to 
create a single data reader that can be accessed by multiple temperature 
variables to return the corresponding interpolated thermal conductivity 
integral?

Thank you,
Andrew Stubblefield
_______________________________________________
Ascend-sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ascend-sim-users

Reply via email to