Hi Matei
I'm afraid I haven't had enough time to focus on this as work has just been
crazy. It's still something I want to get to a mergeable status.
Actually it was working fine it was just a bit rough and needs to be updated to
HEAD.
I'll absolutely try my utmost to get something
Hey Nick, no worries if this can’t be done in time. It’s probably better to
test it thoroughly. If you do have something partially working though, the main
concern will be the API, i.e. whether it’s an API we want to support
indefinitely. It would be bad to add this and then make major changes
Ok - I'll work something up and reopen a PR against the new spark mirror.
The API itself mirrors the newHadoopFile etc methods, so that should be quite
stable once finalised.
It's the wrapper stuff of how to serialize custom classes and read them in
Python that is the potential tricky
Hey Nick, I’m curious, have you been doing any further development on this? It
would be good to get expanded InputFormat support in Spark 1.0. To start with
we don’t have to do SequenceFiles in particular, we can do stuff like Avro (if
it’s easy to read in Python) or some kind of