I don't see how it would get there. That implies that minimum was null, but
the count was non-zero.

The ColumnStatisticsImpl$StringStatisticsImpl.serialize looks like:

@Override
OrcProto.ColumnStatistics.Builder serialize() {
  OrcProto.ColumnStatistics.Builder result = super.serialize();
  OrcProto.StringStatistics.Builder str =
    OrcProto.StringStatistics.newBuilder();
  if (getNumberOfValues() != 0) {
    str.setMinimum(getMinimum());
    str.setMaximum(getMaximum());
    str.setSum(sum);
  }
  result.setStringStatistics(str);
  return result;
}

and thus shouldn't call down to setMinimum unless it had at least some
non-null values in the column.

Do you have multiple threads working? There isn't anything that should
be introducing non-determinism so for the same input it would fail at
the same point.

.. Owen




On Tue, Sep 1, 2015 at 10:51 PM, David Capwell <dcapw...@gmail.com> wrote:

> We are writing ORC files in our application for hive to consume.
> Given enough time, we have noticed that writing causes a NPE when
> working with a string column's stats.  Not sure whats causing it on
> our side yet since replaying the same data is just fine, it seems more
> like this just happens over time (different data sources will hit this
> around the same time in the same JVM).
>
> Here is the code in question, and below is the exception:
>
> final Writer writer = OrcFile.createWriter(path,
> OrcFile.writerOptions(conf).inspector(oi));
> try {
> for (Data row : rows) {
>    List<Object> struct = Orc.struct(row, inspector);
>    writer.addRow(struct);
> }
> } finally {
>    writer.close();
> }
>
>
> Here is the exception:
>
> java.lang.NullPointerException: null
>         at
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$Builder.setMinimum(OrcProto.java:1803)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$StringStatisticsImpl.serialize(ColumnStatisticsImpl.java:411)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.createRowIndexEntry(WriterImpl.java:1255)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.createRowIndexEntry(WriterImpl.java:775)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.createRowIndexEntry(WriterImpl.java:1978)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1985)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:322)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
> ~[hive-exec-0.14.0.jar:0.14.0]
>         at
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2276)
> ~[hive-exec-0.14.0.jar:
>
>
> Versions:
>
> Hadoop: apache 2.2.0
> Hive Apache: 0.14.0
> Java 1.7
>
>
> Thanks for your time reading this email.
>

Reply via email to