I think it is not related to file size, but maximum record size handled by POI. It is a protection against OutOfMemoryErrors. I increased this limit to 10M because was seeing many of them. I do not know if it is configurable in tika server.
Regards, Luis Em ter, 8 de out de 2019 17:46, Chris Mattmann <[email protected]> escreveu: > Hi, > > > > Thanks for your question. Yes, the same way you set the byte size property > in Tika-App (I think through > parser configuration) is how you would do it for Tika-Server. You would > just start the Tika Server yourself > with a custom config file that set this property and then start it on the > default port (making sure any other > ones were killed first). Then Tika-Python will use your own Tika Server > with custom config. > > > > As for catching errors, it will try its best to do that, but it does not > catch all of them and if you find > something it doesn’t catch let us know and we will work to fix it. > > > > Thanks, > > Chris > > > > > > > > > > From: "[email protected]" <[email protected]> > Organization: Avident-IT > Date: Tuesday, October 8, 2019 at 6:06 AM > To: "Mattmann, Chris A (US 1761)" <[email protected]> > Subject: [EXTERNAL] Tika Python questions > > > > Hi > > I have had the pleasure of testing the Tika-python library. I am testing > it out in a new application that are developed for customers. > > It has very good performance, especially for parsing XLSX and XLS files. > > > > However, I have two questions: > The Tika-Server handles only files with a maximum byte size. I get this > error: > org.apache.poi.util.RecordFormatException: Tried to allocate an array of > length 1186956, but 1000000 is the maximum for this record type. > > increasing the maximum allowable size for this record type. > > As a temporary workaround, consider setting a higher override value with > IOUtils.setByteArrayMaxOverride() > > I have tried the Tika-App python (jar file) and it does handle the file > size where files are larger than 1000000. > > In the Tika documentation it says to set MaxBytes to -1 to override and > handle larger files. > > Is there any way to handle this via Tika-Python? To set max files size to > unlimited as the “Tika-App” handles it? > > > How is it possible to catch errors via the Tika-python library, like if > files are encrypted, corrupt etc.? > > > > > Kind regards > > > > HANS MEIJER > > > >
