https://bz.apache.org/bugzilla/show_bug.cgi?id=69557

--- Comment #3 from Gonzalo Aguilar Delgado <gaguilardelg...@medallia.com> ---
Hi @PJ Fanning. 

I think this issue goes further than what you say. Because how much memory does
it worth to waste reading a not so big 11Mb file?

My testing tells me that with a buffer of 150Mb it goes above 8GB of memory
allocated for just one file. 

If you read 4 of these files at once this is 32Gb just for reading files.
Apache Spark and other projects are reading more than 4 files in parallel for
sure. 

Additionally you will not do it by bumping the mx allocated memory to 150Mb
because as you can see here.
https://bz.apache.org/bugzilla/show_bug.cgi?id=69558

The library is asking for buffers of 250Mb (228,086,860) and we saw errors
asking for 350Mb buffers. 

The management of the buffers doesn't seem to be fair too. Since the library
seems to allocate a buffer, read and discard. If the GC doesn't trigger in the
middle we can find situations of 64Gb just used for reading 4 files in
parallel... So the issue doesn't seem to be minor... I would call it blocker. 

Reading in Streaming is an option too. Indeed we do but it has some
functionalities that are not supported. And leaving the issue there seems to me
also a security problem. Since you can make the machine explode by just
selecting a wrong number on the buffer size, the user posting files of just
11Mb... seems to me like a DOS problem. 

If the lack of developers is a problem. Then we have to leave the ticket open
until someone address it. But hiding the problem seems not like an option. 
Do you agree?

Let's at least report it correctly and leave enough feedback so someone can
tackle it. That's what I'm trying to do. 

I was willing to tackle myself, but I don't know how much time can I dedicate
to this one. Maybe someone else have more context than I do. In any case let's
leave the ticket open for a while. Let's see. 

Thank you for the response.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to