Robert Kruszewski created PARQUET-1261:
------------------------------------------

             Summary: Parquet-format interns strings when reading filemetadata
                 Key: PARQUET-1261
                 URL: https://issues.apache.org/jira/browse/PARQUET-1261
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Robert Kruszewski


Parquet-format when deserializing metadata will intern strings. References I 
could find suggested that it had been done to reduce memory pressure early on. 
Java (and jvm in particular) went a long way since then and interning is 
generally discouraged, see 
[https://shipilev.net/jvm-anatomy-park/10-string-intern/] for a good 
explanation. What is more since java 8 there's string deduplication implemented 
at GC level per [http://openjdk.java.net/jeps/192.] During our usage and 
testing we found the interning to cause significant gc pressure for long 
running applications due to bigger GC root set.

This issue proposes removing interning given it's questionable whether it 
should be used in modern jvms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to