On Apr 23, 2010, at 12:33pm, Doug Cutting wrote:

Ken Krugler wrote:
1. I'm assuming there's no compelling reason to read the file headers - in fact, not sure how you'd even get at the data, much less how you'd deal with potentially partial/missing data from a set of Avro files being read as part files.

I'm not sure what you're asking here.

Sorry, I should have been clearer.

I was thinking about the read side of things, when using the Cascading Scheme to pull data from Avro files. If these files have metadata, there's no good way to get at it via the Cascading interface, and given that a directory will typically contain a set of part-xxxxx files, it didn't seem like you could do much with the results in any case. So just checking to make sure I wasn't overlooking something.

2. We'd like to not include Avro source in the Cascading scheme project, but rather just have a dependency on the Avro jar. We have a similar relationship between Bixo and Tika, and what's worked well is for the Bixo master branch to have a dependency on the Tika snapshot builds, so we can quickly iterate on both projects. So are there plans to start pushing Avro snapshot builds to the Apache snapshots repository? I see occasional Avro releases to the Maven central repo (1.0, 1.2, 1.3.2) but nothing for snapshots.

I'm okay if someone wants to, e.g., configure a nightly Hudson build that pushes out an Avro snapshot jar. Apache releases should not depend on snapshots, but snapshots are useful for development.

Avro's build.xml already includes a task to post a snapshot jar. I tested it once, which accounts for the single Avro snapshot that exists. So it should be simple to configure Hudson to do this. Philip was going to setup Hudson builds for Avro. Philip?

That would be great, thanks!

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to