Re: [ASCEND] Data Reader Efficient Use

John Pye Thu, 01 Oct 2020 01:48:01 -0700

Hi Andrew

Thanks for your question about ASCEND and the scalability of its data reader 
(DR) functionality. I wrote the original code there, and it was later expanded 
by José Zapata to add support for improved spline methods and additional data 
formats. The documentation, such as it is, is mostly in our wiki, here: 
https://ascend4.org/Data_reader.

You are proposing a really nice idea which is to share the in-memory instances
of the data tables that the DR uses, to save memory. I think it's a very good
idea.

As noted in the documentation, the DR uses the 'external relation' interface in
ASCEND. This means that the datareader.c code has to work within the
constraints of that API
(https://ascend4.org/Writing_ASCEND_external_relations_in_C). That should be
fine though; we get to implement a 'prepare' and a 'calculate' function for
each external relation, and hence for each DR instance. When writing a DR
extrel, the user gives details on the file format they want to use, the
specific data file they want to load, as well as the columns they want to
interpolate, using a 'data instance' such as that 'drconf' model mentioned in
the Data Reader wiki page. 'datareader.c' handles the interfacing of the Data
Reader with the user's ASCEND code.

The actual loading of the files is coordinated through dr.c (and dr.h). The
file format (datareader_set_format) is used to set up links to the relevant
file-format-specific routines that load the header and data rows from the data
file in an abstracted way.

So by the time we're talking about a specific kind of datafile, the key stuff
is happening eg in csv.c
(http://code.ascend4.org/ascend/trunk/models/johnpye/datareader/csv.c?revision=3291&view=markup).
csv.c loads the raw data from a CSV file and stores everything in the 'data'
field of the DataReader struct. That is the key part where some re-use might be
possible. It looks to me that the CSV file loads all of the columns from the
file, even if only a part of it is actually requested by the user. That's
actually nice -- it means we can reuse the resulting 'data' pointer across
multiple DR instances.

If you were keen to implement an improvement here, I would say that a good
option would be to modify the dr.c code so that when a request comes in to load
a file of a given format, you first check a list of previously-loaded files and
formats (eg g_datareader_files?). If the file has already been loaded and with
the same format specifier, copy the 'data' pointer into the current DataReader
struct, increment a reference, and go from there. If the file has not been
loaded, load it and add it to the list. Then there would need to be some
destructor code need, perhaps in ascend/compiler/simlist.c.

Another point on the scalability of the DR is that it uses linear search to
position within the file. The idea there was that our typical use-case was
time-stepping through a weather file, eg hourly throughout a year. However, in
other uses, it might be desirable to be able to jump around more efficiently.
In that case, there could be a binary search tree structure implemented for
faster access to the data. Depending on whether you are using large data files,
that might or might not be important.

If, on the other hand, you are looking for a different approach to this, you
might like to look at the FPROPS code in ASCEND. This specifically implements
fluid property data such as thermal conductivity, and reuses the data
structures automatically. In that case, you could implement your own fluid data
type (some recent work I did with ASCEND was in the 'fprops-incomp' branch, for
modelling properties of incompressible fluids)
http://code.ascend4.org/ascend/branches/fprops-incomp/models/johnpye/fprops/.
You could look into that and see if it gives you an easier way forward. It has
the advantage of making your same evaluation routines also accessible from
Python, if that's of interest.

Hope this helps!

Cheers
JP

On 22/9/20 7:18 am, Andrew Stubblefield via Ascend-sim-users wrote:
Hello,

I am fairly new to ASCEND, and I have just discovered the Data Reader
external library function, which seems a good fit for my application. I would
like to use Data Reader to interpolate tabulated data from a CSV file with two
columns: temperature [K] and thermal conductivity integral [W/m]. I have
achieved the desired results so far by creating a separate data reader object
for each temperature variable. For instance, if I divide my part of interest
into four segments, then I declare four temperature variables and create four
data readers to access the same CSV file to return the thermal conductivity for
each of the four segments. Upon loading the model, I get four solver messages
that basically say "Created data reader at Memory Location" and "Read ###
Lines". So it appears that each data reader I create reads in all the
information from the CSV file.
Now I would like to discretize my part of interest into many more segments
(hundreds possibly). Using my current method, I will need to create hundreds
of data readers that all read in the same exact information. Is it possible to
create a single data reader that can be accessed by multiple temperature
variables to return the corresponding interpolated thermal conductivity
integral?

Thank you,
Andrew Stubblefield

_______________________________________________
Ascend-sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ascend-sim-users

Re: [ASCEND] Data Reader Efficient Use

Reply via email to