All, As part of moving ORC out of Hive, we pulled all of the vectorization storage and sarg classes into a separate module, which is named storage-api. Although it is currently only used by ORC, it could be used by Parquet or Avro if they wanted to make a fast vectorized reader that read directly in to Hive's VectorizedRowBatch without needing a shim or data copy. Note that this is in many ways similar to pulling the Arrow project out of Drill.
This unfortunately still leaves us with a circular dependency between Hive and ORC. I'd hoped that storage-api wouldn't change that much, but that doesn't seem to be happening. As a result, ORC ends up shipping its own fork of storage-api. Although we could make a new project for just the storage-api, I think it would be better to make it a subproject of Hive that is released independently. What do others think? Owen