Re: Data Type Mapping

Nathan Potter Mon, 06 Aug 2007 15:53:30 -0700


Edward et. al,

Perhaps I should have prefaced my request with an explanation of whatI am up too. I am working on the REAP project (Which I imagine youare already familiar with. The REAP's PIs include the lead PIs thatinitiated the Kepler project e.g., Ludäscher, Jones, Altintas, etc.)

Part of the REAP project is to provide data integration with OPeNDAPdata sources (www.opendap.org). From my (so far limited) view of PTIIand Kepler it seems that many of the OPeNDAP datasources include datasets whose structure is from new (to PTII and Kepler) scientificdomains and in many cases includes 4D data. Global atmospheric modelsand satellite orbits are two examples of such data.

The OPeNDAP data model and the PTII data model have manysimilarities, but they diverge in that:


- The OPeNDAP data model is optimized for multidimensional arrays.

- The OPeNDAP data model supports a Grid data type that encapsulatesmap vector semantics for multidimensional gridded data.

Both of these data types represent high value content that the REAPproject would like to make available in PTII/Kepler.


I hope a little background helps to further our discussion.


On Aug 6, 2007, at 5:39 AM, Edward A. Lee wrote:


Nathan,

Perhaps it would help to explain the reason for the current
design:

The existence is MatrixToken types has two motivations.
First, matrices have a natural set of operations that require
fairly sophisticated libraries to support (multiplication,
inverse, eigenvalue computation, etc.). As far as I known,
there are no such natural operations for higher dimensional
matrices.  Second, algorithms using matrices need to be
efficient. Data should be represented using native Java
data types, not wrapped in Tokens.

That kind of optimization is very useful, and is exactly what is donefor all arrays in the reference implementations of the OPeNDAP datamodel.

In my experience, how to represent higher dimensional
data really depends on the application.  E.g., images
are 2-D, and can be represented in matrix types.
Video is 3-D, and is naturally represented as a sequence
of matrix tokens.

By mapping the OPeNDAP data model into PTII/Kepler we will be makingavailable many multidimensional data sets, including 4D atmosphericmodels. Many of these data sets are quite large, so storage/processing optimizations like the one used in the MatrixToken datatype are generally favored.

Part of this process is to determine how to most effectivelyrepresent this information in the target(PTII/Kepler) data model.This should be driven in part by the tools available in theapplication for subseting and slicing the data.

At this point, based on your comments and those posted by C. Brooks,my instinct is to develop a prototype based using MatrixTokens forall 2D arrays and nested ArrayTokens elsewhere, as required. You makean excellent point about the optimization of using the native javatypes for arrays, and certainly promoting individual array values toArrayToken objects will probably create serious memory usage andperformance issues in the long run.


I can see a couple of alternate mechainsms for addressing this:

- Extend the PTII/Kepler data model with a more generalized arraytype that uses an efficient storage mechanism and that provides areasonable interface for sub-setting/slicing/dicing the array. I amnot suggesting a set linear algebra stype functions, just sub-settingmethods. (I think one goal here would be that if a sub-settingactivity produces a 2D array result then it should get mapped to aMatrixToken type.)

- OPeNDAP servers can perform server side sub-setting. It may be themost expedient thing would be to force the sub-setting to happen onthe orgin server and only import 1D or 2D subsets into the ptolemyenvironment.


You have readily available the following mechanisms
for adding dimensions:
 - sequences (streams)
 - arrays (and arrays of arrays)
 - records
Which to choose will depend on the modeling problem, I think.

Note that a while ago, we did some major research into
representations of multidimensional data via generalized streams
(multidimensional streams).  See:

http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/

I followed this link to a page containing the abstract for the paperyou mentioned. Unfortunately the link on that page to the full paper:

http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/MurthyLee_MultimensionalSDF.pdf

Returns a 404 not found. Is there another location that you know of?I would interested in reading the document.



Thanks you all for your thoughtful responses,



Nathan


This mechanism was implemented in Ptolemy Classic, but never ported
to Ptolemy II.  Probably some good research to be done here still...

Edward


At 07:57 PM 8/3/2007, Nathan Potter wrote:

Greetings,

I am looking at how I might represent an N dimensional array in the
ptolemy data model.

There is an obvious mapping for 1D (ArrayToken) and 2D (MatrixToken)
arrays. But when I look at what I might do to map higher dimensions I
get stopped by my lack of knowledge regarding the way that users
expect to see data represented in ptolemy/kepler.

I imagine I could make ArrayTokens whose members are ArrayTokens
whose members are ArrayTokens whose...

OR

I could use nested RecordTokens in much the same way.

Which is preferable?

There is a RecordDisassembler actor that could probably pick apart
the later, but the former seems to better preserve the semantic
relationships of the dimensions.

Ultimately I suppose the question is: Are there any actors in the
library that are designed to deal with either construct? Or are
multidimensional arrays a relatively foreign type of data
organization in ptolemy/kepler?


Thanks,


Nathan



= = =
Nathan Potter                        ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852

----------------------------------------------------------------------------Posted to the ptolemy-hackers mailing list. Please sendadministrativemail for this list to: ptolemy-hackers-[EMAIL PROTECTED]


------------
Edward A. Lee
Chair of EECS and Robert S. Pepper Distinguished Professor
231 Cory Hall, UC Berkeley, Berkeley, CA 94720-1770
phone: 510-642-0253, fax: 510-642-2845
[EMAIL PROTECTED], http://ptolemy.eecs.berkeley.edu/~eal


= = =
Nathan Potter                        ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852





----------------------------------------------------------------------------
Posted to the ptolemy-hackers mailing list.  Please send administrative
mail for this list to: [EMAIL PROTECTED]

Re: Data Type Mapping

Reply via email to