On Apr 23, 2010, at 12:33pm, Doug Cutting wrote:
Ken Krugler wrote:
1. I'm assuming there's no compelling reason to read the file
headers - in fact, not sure how you'd even get at the data, much
less how you'd deal with potentially partial/missing data from a
set of Avro files being read as part files.
I'm not sure what you're asking here.
Sorry, I should have been clearer.
I was thinking about the read side of things, when using the Cascading
Scheme to pull data from Avro files. If these files have metadata,
there's no good way to get at it via the Cascading interface, and
given that a directory will typically contain a set of part-xxxxx
files, it didn't seem like you could do much with the results in any
case. So just checking to make sure I wasn't overlooking something.
2. We'd like to not include Avro source in the Cascading scheme
project, but rather just have a dependency on the Avro jar.
We have a similar relationship between Bixo and Tika, and what's
worked well is for the Bixo master branch to have a dependency on
the Tika snapshot builds, so we can quickly iterate on both projects.
So are there plans to start pushing Avro snapshot builds to the
Apache snapshots repository? I see occasional Avro releases to the
Maven central repo (1.0, 1.2, 1.3.2) but nothing for snapshots.
I'm okay if someone wants to, e.g., configure a nightly Hudson build
that pushes out an Avro snapshot jar. Apache releases should not
depend on snapshots, but snapshots are useful for development.
Avro's build.xml already includes a task to post a snapshot jar. I
tested it once, which accounts for the single Avro snapshot that
exists. So it should be simple to configure Hudson to do this.
Philip was going to setup Hudson builds for Avro. Philip?
That would be great, thanks!
-- Ken
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g