https://issues.apache.org/bugzilla/show_bug.cgi?id=52211
Yegor Kozlov <ye...@dinom.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #9 from Yegor Kozlov <ye...@dinom.ru> --- It is very likely that your hypothesis is correct and this oine of code can cause problems. The problematic piece of code exists since POI-3.5, when OpenXml4j was contributed to Apache POI. I guess the intention was to ensure that the string being parsed and validated is in the ASCII encoding. This "worked" for years but the conversion does not make sense because if the input argument contains characters above ASCII then they are converted to 0XFFFD ("not a character" unicode) and the subsequent validation against the patternMediaType regex fails. Consider the following examples: (a) new ContentType("text/\u007E") (b) new ContentType("text/\u0080") The first case (a) works because all characters in the input string are in ASCII and the conversion does not change the input string. The second case (b) fails no matter if the input argument is re-converted to US-ASCII or not. If you apply your fix (contentTypeASCII=contentType) then the regex check at line 146 fails. Current code first converts the input string to "text/\uFFFD" and then the regex fails. So I agree that this conversion is extra and can be removed. The fix is coming soon. Regards, Yegor (In reply to comment #8) > Hello, > > We are using the POI API (stable 3.8) on a system running ibm500 encoding as > default encoding. > So we got the same error, when trying to create a Workbook using > WorkbookFactory.create(ByteArrayInputStream bais). > > We found that the problem lies in the method > org.apache.poi.openxml4j.opc.internal.ContentType.ContentType(String > contentType) > > In line 139, the follwoing code is called: > contentTypeASCII = new String(contentType.getBytes(), "US-ASCII"); > > The String.getBytes() causes the system to return the bytes in default > system encoding (for instance ibm500). Afterwards this should be converted > using encoding US-ASCII. This cannot work. > > So, we wonder, why this conversion will be done? > > We deleted the line and just put following code: > contentTypeASCII = contentType; > > Afterwards it worked fine. > > Regards > Constantin -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org