Hi all,

I just tagged Query.jl v0.1.0, and with that my little summer project should be 
ready for wider consumption.


Query.jl hopes to be the equivalent of LINQ or dplyr for julia, eventually. It 
provides a unified way to query many different data sources, the most prominent 
being DataFrames.


You can find more information and documentation at 
https://github.com/davidanthoff/Query.jl.


You should consider the package in beta at the moment: it is more or less 
feature complete and functional for a first release, but it hasn't been tested 
widely. Please do take it for a spin and report any bugs, usability issues etc. 
back.


I'm also keenly interested in collaborators. This is an ambitious project, and 
any help would be greatly appreciated. PRs are welcome!


Finally, the package builds on a lot of previous work. I just want to highlight 
some: C# LINQ (the basis for the whole package design), the NamedTuples package 
(couldn't have built Query without it) and the DataStreams ecosystem that 
enabled rapid integration of a large number of sources and sinks. And of course 
julia itself. It is quite amazing how simple it was to take a really complex 
design like LINQ and port it to julia. Not only did I never bump into any 
language limitation in the process, but julia actually enabled a couple of 
really neat features in Query.jl that would not have been possible in C#.


Here are some other highlights of the package:

- Query.jl is an almost complete implementation of the query expression section 
of the C# specification, with some additional julia specific features added in.
- The package supports a large number of data sources: DataFrames, TypedTables, 
normal arrays, any DataStream source (this includes CSV, Feather, SQLite), 
NDSparseData structures and any type that can be iterated.
- The results of a query can be materialized into a range of different data 
structures: iterators, DataFrames, arrays or any DataStream sink (this includes 
CSV and Feather files).
- One can mix and match almost all sources and sinks within one query. For 
example, one can easily perform a join of a DataFrame with a CSV file and write 
the results into a Feather file, all within one query.
- The type instability problems that one can run into with DataFrames do not 
affect Query.jl, i.e. queries against DataFrames are completely type stable.
- There are three different APIs that package authors can use to make their 
data sources queryable with this package. The most simple API only requires a 
data source to provide an iterator. Another API provides a data source with a 
complete graph representation of the query and the data source can then e.g. 
rewrite that query graph as a SQL statement to execute the query. The final API 
allows a data source to provide its own data structures that can represent a 
query graph.
- The package is completely documented.

Have fun and please report back any issues you run into.

Best,
David


-- 
You received this message because you are subscribed to the Google Groups 
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to