Hey everyone,

I know it's been mentioned here and there, but now it's official: two new 
packages have been officially released for 0.4, DataStreams.jl and CSV.jl. 
SQLite.jl has also gone through a big overhaul to modernize the code and 
rework the data processing interface.

DataStreams.jl is a new package with a lofty goal and not a lot of code. It 
aims to put forth a data ingestion/processing framework that can be used by 
all types of data-reader/ingestion/source/sink/writer type packages. The 
basic idea is that for a type of data source, defining a `Source` and 
`Sink` types, and then implementing the various combinations of 
`Data.stream!(::Source, ::Sink)` methods that make sense. For example, 
CSV.jl and SQLite.jl now both have `Source` and `Sink` types, and I've 
simply defined the following methods between the two packages:

Data.stream!(source::CSV.Source, sink::SQLite.Sink)  =>  parse a CSV file 
represented by `source` directly into the SQLite table represented by `sink`
Data.stream!(source::SQLite.Source, sink::CSV.Sink)  =>  fetch the SQLite 
table represented by `source` directly out to a CSV file represented by 
`sink`

The DataStreams.jl package also defines a `Data.Table` type which is simply:

type Table{T}
    schema::Data.Schema
    data::T
end

this is meant as a "backend-agnostic" kind of type that represents an 
in-memory Julia structure. Currently the default constructors put a 
`Vector{NullableVector}` as the `.data` field, but it could really be 
anything you wanted (e.g. DataFrame, Matrix, etc.). The aim of `Data.Table` 
certainly isn't to replace something like DataFrames, but rather to act as 
a default "pure julia type" with the DataStreams.jl framework. Indeed, to 
do a non-copying convert of a `Data.Table` to a `DataFrame` is just: 
`DataFrame(dt::Data.Table)`.

You can see more details in the blog post I wrote up 
here: http://julialang.org/blog/2015/10/datastreams/

A big thanks to a number of people as well who have helped encourage and 
develop these packages with me. I truly love the community and caliber of 
people around here and just want to say thanks.

DataStreams.jl: https://github.com/JuliaDB/DataStreams.jl
CSV.jl: https://github.com/JuliaDB/CSV.jl
SQLite.jl: https://github.com/JuliaDB/SQLite.jl

-Jacob

Reply via email to