Re: (De-)serializing collections/datasets

Karl Wettin Thu, 07 Feb 2008 11:16:49 -0800

7 feb 2008 kl. 19.22 skrev Ted Dunning:

There are many alternatives.
The simplest is tab-delimited files with a header line. That workspretty
well almost all of the time.
For instance, most of the UCI datasets are in pretty much thatformat. Mostof my data sets wind up in that format. Anything from a relationaldatabase
falls into that format pretty easily.

I'd say that is more or less the same thing as ARFF, only that ARFFhas a typed header, is comma delimited and can optionally be stored ina sparse mode.

The file format is not that important to me (and of course it shouldbe an interchangable strategy), all I want is to get started on a dataaccess API. At this point I don't care what matrix or what not willused for speedy access, I want the API used to load the matrix withdata. A seekable instance enumerator working straight of the filessystem. InstanceReader, InstanceWriter. It would allow me to getstarted with pre processing filters (resampling, discretization, etc).




  karl




On 2/7/08 10:15 AM, "Karl Wettin" <[EMAIL PROTECTED]> wrote:


5 feb 2008 kl. 00.51 skrev Grant Ingersoll:

I haven't used Weka much, is ARFF


I never used anything but Weka that much, are there any alternatives
to ARFF?

Re: (De-)serializing collections/datasets

Reply via email to