Edward et. al,


Perhaps I should have prefaced my request with an explanation of what I am up too. I am working on the REAP project (Which I imagine you are already familiar with. The REAP's PIs include the lead PIs that initiated the Kepler project e.g., Ludäscher, Jones, Altintas, etc.)

Part of the REAP project is to provide data integration with OPeNDAP data sources (www.opendap.org). From my (so far limited) view of PTII and Kepler it seems that many of the OPeNDAP datasources include data sets whose structure is from new (to PTII and Kepler) scientific domains and in many cases includes 4D data. Global atmospheric models and satellite orbits are two examples of such data.

The OPeNDAP data model and the PTII data model have many similarities, but they diverge in that:

- The OPeNDAP data model is optimized for multidimensional arrays.
- The OPeNDAP data model supports a Grid data type that encapsulates map vector semantics for multidimensional gridded data.

Both of these data types represent high value content that the REAP project would like to make available in PTII/Kepler.

I hope a little background helps to further our discussion.


On Aug 6, 2007, at 5:39 AM, Edward A. Lee wrote:



Nathan,

Perhaps it would help to explain the reason for the current
design:

The existence is MatrixToken types has two motivations.
First, matrices have a natural set of operations that require
fairly sophisticated libraries to support (multiplication,
inverse, eigenvalue computation, etc.). As far as I known,
there are no such natural operations for higher dimensional
matrices.  Second, algorithms using matrices need to be
efficient. Data should be represented using native Java
data types, not wrapped in Tokens.



That kind of optimization is very useful, and is exactly what is done for all arrays in the reference implementations of the OPeNDAP data model.



In my experience, how to represent higher dimensional
data really depends on the application.  E.g., images
are 2-D, and can be represented in matrix types.
Video is 3-D, and is naturally represented as a sequence
of matrix tokens.



By mapping the OPeNDAP data model into PTII/Kepler we will be making available many multidimensional data sets, including 4D atmospheric models. Many of these data sets are quite large, so storage/ processing optimizations like the one used in the MatrixToken data type are generally favored.

Part of this process is to determine how to most effectively represent this information in the target(PTII/Kepler) data model. This should be driven in part by the tools available in the application for subseting and slicing the data.

At this point, based on your comments and those posted by C. Brooks, my instinct is to develop a prototype based using MatrixTokens for all 2D arrays and nested ArrayTokens elsewhere, as required. You make an excellent point about the optimization of using the native java types for arrays, and certainly promoting individual array values to ArrayToken objects will probably create serious memory usage and performance issues in the long run.

I can see a couple of alternate mechainsms for addressing this:

- Extend the PTII/Kepler data model with a more generalized array type that uses an efficient storage mechanism and that provides a reasonable interface for sub-setting/slicing/dicing the array. I am not suggesting a set linear algebra stype functions, just sub-setting methods. (I think one goal here would be that if a sub-setting activity produces a 2D array result then it should get mapped to a MatrixToken type.)

- OPeNDAP servers can perform server side sub-setting. It may be the most expedient thing would be to force the sub-setting to happen on the orgin server and only import 1D or 2D subsets into the ptolemy environment.





You have readily available the following mechanisms
for adding dimensions:
 - sequences (streams)
 - arrays (and arrays of arrays)
 - records
Which to choose will depend on the modeling problem, I think.

Note that a while ago, we did some major research into
representations of multidimensional data via generalized streams
(multidimensional streams).  See:

http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/


I followed this link to a page containing the abstract for the paper you mentioned. Unfortunately the link on that page to the full paper:

http://ptolemy.eecs.berkeley.edu/publications/papers/02/synchronous/ MurthyLee_MultimensionalSDF.pdf

Returns a 404 not found. Is there another location that you know of? I would interested in reading the document.


Thanks you all for your thoughtful responses,



Nathan






This mechanism was implemented in Ptolemy Classic, but never ported
to Ptolemy II.  Probably some good research to be done here still...

Edward


At 07:57 PM 8/3/2007, Nathan Potter wrote:


Greetings,

I am looking at how I might represent an N dimensional array in the
ptolemy data model.

There is an obvious mapping for 1D (ArrayToken) and 2D (MatrixToken)
arrays. But when I look at what I might do to map higher dimensions I
get stopped by my lack of knowledge regarding the way that users
expect to see data represented in ptolemy/kepler.

I imagine I could make ArrayTokens whose members are ArrayTokens
whose members are ArrayTokens whose...

OR

I could use nested RecordTokens in much the same way.

Which is preferable?

There is a RecordDisassembler actor that could probably pick apart
the later, but the former seems to better preserve the semantic
relationships of the dimensions.

Ultimately I suppose the question is: Are there any actors in the
library that are designed to deal with either construct? Or are
multidimensional arrays a relatively foreign type of data
organization in ptolemy/kepler?


Thanks,


Nathan



= = =
Nathan Potter                        ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852



--------------------------------------------------------------------- ------- Posted to the ptolemy-hackers mailing list. Please send administrative mail for this list to: ptolemy-hackers- [EMAIL PROTECTED]



------------
Edward A. Lee
Chair of EECS and Robert S. Pepper Distinguished Professor
231 Cory Hall, UC Berkeley, Berkeley, CA 94720-1770
phone: 510-642-0253, fax: 510-642-2845
[EMAIL PROTECTED], http://ptolemy.eecs.berkeley.edu/~eal



= = =
Nathan Potter                        ndp at opendap.org
OPeNDAP, Inc.                        541.752.1852





----------------------------------------------------------------------------
Posted to the ptolemy-hackers mailing list.  Please send administrative
mail for this list to: [EMAIL PROTECTED]

Reply via email to