Dear all, as part of the HadoopOffice library ( https://github.com/zuinnote/hadoopoffice/wiki) we provide the functionality to read office documents, such as MS Excel, on Big Data platforms, such as Hadoop/Hive/Spark/Flink.
I want to release a new version supporting POI 4.0.0, but I have one remaining blocking issue: The Big Data platforms use an old version of commons-compress (between 1.4.x and 1.9.x). This means I am always running into the exception in ZipArchiveThresholdInputStream "InputStream of class [..] is not implementing InputStreamStatistics" ( https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789 ). Unfortunately, updating these platforms to the latest commons-compress is very intrusive and for many organizations not possible. I need now to find a workaround for this. Alternative classpath settings are not working very well and create another mess. Do you have any idea on how I can deal with this check? Can I inject somehow InputStreamStatistics in my InputStream? Or can I somehow inject my own ZipArchiveInputStream? Alternatively, could Apache POI instead of using ZipArchiveInputStream create another class POIZipArchiveInputStream and let this custom class extend ArchiveInputStream and implement InputStreamStatistics? This would remove all my classpath issues with the Big Data platforms .... Thank you. Best regards