Hi Joseph,

I'm making some data visualizations and despite of not having an advice on conceptual design, I share part of the practical problem of having to work with CSV values in a Smalltalk environment and some times with a lot of records (my recent project works with 270k of them). The visualization I did was documented broadly at [1], but essentially I create a "PublishedMedInfo class >> loadDataFromCSV: aFile usingDelimiter: aCharacter" method that fill out my domain objects that came from an excel (and then CSV) file.

[1] http://mutabit.com/offray/blog/en/entry/sdv-infomed

For my recent project [2] I'm using a SQLite bridge between Pharo and the imported data from CVS. In that way I'm delegating storage and querying (including duplicates) to a small but potent database back-end, while using objects to model "higher" concerns of my domain. I know some worries about objects-database mismatch impedance, but working with data and its visualization/reporting lets you to build bridges leveraging the former to the database and the last to objects, while using the strengths of each one in their own place.

[2] https://twitter.com/offrayLC/status/725314838696701957

So my practical advice is to explore this kinds of combination early in your design. May be a quick hands on mockup could let you know if it works for you. In my case it has and I'm implementing it sooner in my projects.

Cheers,

Offray

Ps: Long time without writing, but I have been reading constantly. Nice to be "back" :-)

On 29/04/16 09:28, Joseph Alotta wrote:
Thanks for all the help.

I like the idea of having the code sense the format of the data and acting accordingly.

For separators, I could count the number of each kind of separators in the file and compare it to the number of lines. Say 3 or more separators per line.

Then I can parse by columns and look for the dominant data type. For a column that is 60% matching a date type, I can assume it is a date column and the mismatches are headers.

The amount should be numeric.

The payee should be mostly letters, etc.

One issue I have is knowing what to call the object that does this. It would not be a Transaction, because this is a function of many Transactions.

FileLoader?  FileAnalyzer?

Also, at this point I should be looking for missing dates and duplicates.

Duplicates are troublesome, since everytime I download the file, it starts from the beginning of the year again. I keep downloading them because I think they will only keep data for 6 months or so.

Also duplicate transactions are valid. Suppose I go into a coffee shop and buy a cup of coffee, then go back the same day, same store for a refill.

Your thoughts?

Sincerely,

Joe.



------------------------------------------------------------------------
View this message in context: Re: conceptual design help <http://forum.world.st/conceptual-design-help-tp4892763p4892966.html> Sent from the Squeak - Beginners mailing list archive <http://forum.world.st/Squeak-Beginners-f107673.html> at Nabble.com.


_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners

Reply via email to