[
https://issues.apache.org/jira/browse/PARQUET-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Halliday updated PARQUET-1953:
----------------------------------
Description:
parquet-hadoop provides the only mechanism to load .parquet files and has an
optional (provided) dependency on hadoop-common, implying that it is possible
to use parquet-hadoop without using hadoop. However, it is required.
The following code is needed to instantiate a ParquetFileReader
{code}
final class LocalInputFile(file: File) extends InputFile {
def getLength() = file.length()
def newStream(): SeekableInputStream = {
val input = new FileInputStream(file)
new DelegatingSeekableInputStream(input) {
def getPos(): Long = input.getChannel().position()
def seek(bs: Long): Unit = {
val _ = input.getChannel().position(bs)
}
}
}
}
{code}
but using this leads to a runtime exception because hadoop is missing
transitive dependency on org.apache.hadoop.fs.PathFilter which then depends on
org.apache.hadoop.fs.Path, both in hadoop-common.
Requiring downstream users to depend on hadoop-common is an extremely large
dependency and I would rather that this was not the case.
A search for "import org.apache.hadoop" in src/main reveals a few more places
where the dependency is hardwired, although often in deprecated static
constructors and therefore benign.
was:
parquet-hadoop provides the only mechanism to road .parquet files and has an
optional (provided) dependency on hadoop-common, implying that it is possible
to use parquet-hadoop without using hadoop. However, it is required.
The following code is needed to instantiate a ParquetFileReader
{code}
final class LocalInputFile(file: File) extends InputFile {
def getLength() = file.length()
def newStream(): SeekableInputStream = {
val input = new FileInputStream(file)
new DelegatingSeekableInputStream(input) {
def getPos(): Long = input.getChannel().position()
def seek(bs: Long): Unit = {
val _ = input.getChannel().position(bs)
}
}
}
}
{code}
but using this leads to a runtime exception because hadoop is missing
transitive dependency on org.apache.hadoop.fs.PathFilter which then depends on
org.apache.hadoop.fs.Path, both in hadoop-common.
Requiring downstream users to depend on hadoop-common is an extremely large
dependency and I would rather that this was not the case.
A search for "import org.apache.hadoop" in src/main reveals a few more places
where the dependency is hardwired, although often in deprecated static
constructors and therefore benign.
> hadoop-common is not an optional dependency
> -------------------------------------------
>
> Key: PARQUET-1953
> URL: https://issues.apache.org/jira/browse/PARQUET-1953
> Project: Parquet
> Issue Type: Bug
> Reporter: Sam Halliday
> Priority: Major
>
> parquet-hadoop provides the only mechanism to load .parquet files and has an
> optional (provided) dependency on hadoop-common, implying that it is possible
> to use parquet-hadoop without using hadoop. However, it is required.
> The following code is needed to instantiate a ParquetFileReader
> {code}
> final class LocalInputFile(file: File) extends InputFile {
> def getLength() = file.length()
> def newStream(): SeekableInputStream = {
> val input = new FileInputStream(file)
> new DelegatingSeekableInputStream(input) {
> def getPos(): Long = input.getChannel().position()
> def seek(bs: Long): Unit = {
> val _ = input.getChannel().position(bs)
> }
> }
> }
> }
> {code}
> but using this leads to a runtime exception because hadoop is missing
> transitive dependency on org.apache.hadoop.fs.PathFilter which then depends
> on org.apache.hadoop.fs.Path, both in hadoop-common.
> Requiring downstream users to depend on hadoop-common is an extremely large
> dependency and I would rather that this was not the case.
> A search for "import org.apache.hadoop" in src/main reveals a few more places
> where the dependency is hardwired, although often in deprecated static
> constructors and therefore benign.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)