Hey everyone,
I know it's been mentioned here and there, but now it's official: two new
packages have been officially released for 0.4, DataStreams.jl and CSV.jl.
SQLite.jl has also gone through a big overhaul to modernize the code and
rework the data processing interface.
DataStreams.jl is a new package with a lofty goal and not a lot of code. It
aims to put forth a data ingestion/processing framework that can be used by
all types of data-reader/ingestion/source/sink/writer type packages. The
basic idea is that for a type of data source, defining a `Source` and
`Sink` types, and then implementing the various combinations of
`Data.stream!(::Source, ::Sink)` methods that make sense. For example,
CSV.jl and SQLite.jl now both have `Source` and `Sink` types, and I've
simply defined the following methods between the two packages:
Data.stream!(source::CSV.Source, sink::SQLite.Sink) => parse a CSV file
represented by `source` directly into the SQLite table represented by `sink`
Data.stream!(source::SQLite.Source, sink::CSV.Sink) => fetch the SQLite
table represented by `source` directly out to a CSV file represented by
`sink`
The DataStreams.jl package also defines a `Data.Table` type which is simply:
type Table{T}
schema::Data.Schema
data::T
end
this is meant as a "backend-agnostic" kind of type that represents an
in-memory Julia structure. Currently the default constructors put a
`Vector{NullableVector}` as the `.data` field, but it could really be
anything you wanted (e.g. DataFrame, Matrix, etc.). The aim of `Data.Table`
certainly isn't to replace something like DataFrames, but rather to act as
a default "pure julia type" with the DataStreams.jl framework. Indeed, to
do a non-copying convert of a `Data.Table` to a `DataFrame` is just:
`DataFrame(dt::Data.Table)`.
You can see more details in the blog post I wrote up
here: http://julialang.org/blog/2015/10/datastreams/
A big thanks to a number of people as well who have helped encourage and
develop these packages with me. I truly love the community and caliber of
people around here and just want to say thanks.
DataStreams.jl: https://github.com/JuliaDB/DataStreams.jl
CSV.jl: https://github.com/JuliaDB/CSV.jl
SQLite.jl: https://github.com/JuliaDB/SQLite.jl
-Jacob