Dear all,

as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the functionality
to read office documents, such as MS Excel, on Big Data platforms, such as
Hadoop/Hive/Spark/Flink.

I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).

Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.

Do you have any idea on how I can deal with this check?  Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....


Thank you.

Best regards

Reply via email to