[ https://issues.apache.org/jira/browse/NIFI-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499394#comment-16499394 ]
Otto Fowler commented on NIFI-5259: ----------------------------------- I see a difference in how the GZIPOutputStream is handled between EventFileCompressor and CompressableRecordWriter and I wonder if that makes a difference. In CompressableRecordWriter we explicitly call out that nifi needs to close the GZIPOutputStream explicitly: {code:java} if (compressed) { // because of the way that GZIPOutputStream works, we need to call close() on it in order for it // to write its trailing bytes. But we don't want to close the underlying OutputStream, so we wrap // the underlying OutputStream in a NonCloseableOutputStream // We don't have to check if the writer is dirty because we will have already checked before calling this method. if (out != null) { out.close(); } if (tocWriter != null && eventId != null) { tocWriter.addBlockOffset(rawOutStream.getBytesWritten(), eventId); } final OutputStream writableStream = new BufferedOutputStream(new GZIPOutputStream(new NonCloseableOutputStream(rawOutStream), 1), 65536); this.byteCountingOut = new ByteCountingOutputStream(writableStream, byteOffset); {code} but... In the EventFileCompressor nifi {code:java} try (final OutputStream ncos = new NonCloseableOutputStream(byteCountingOut); final OutputStream gzipOut = new GZIPOutputStream(ncos, 1)) { StreamUtils.copy(fis, gzipOut, blockEnd - blockStart); } {code} If I understand correctly, close() will *NOT* be called in this situation on the GZIPOutputStream, and not all the bytes may be flushed. Is that correct? > Provenance repo "failed to perform background maintenance procedures" due > failing to read schema > ------------------------------------------------------------------------------------------------ > > Key: NIFI-5259 > URL: https://issues.apache.org/jira/browse/NIFI-5259 > Project: Apache NiFi > Issue Type: Bug > Affects Versions: 1.6.0 > Environment: Dockerized NiFi v1.6.0, with a link to a Registry > instance, receiving data from MiNiFi java v0.4.0 and NiFi v1.6.0 > Reporter: Joseph Percivall > Assignee: Mark Payne > Priority: Major > > Seeing an odd error (ST below) with the Provenance Repo as a background task > and also when attempting to query it. It's not getting a lot of data and the > issue persists through restarts of the container and also > stop/rm/docker-compose up of the container. > Looking at the code, it's attempting to read the first record in the repo: > final List<ProvenanceEventRecord> firstEvents = eventStore.getEvents(0, 1); > Looking through the provenance record itself, it appears the event appears to > just be missing that field altogether. > > {quote}2018-06-01 19:32:55,114 ERROR [Provenance Repository Maintenance-1] > o.a.n.p.index.lucene.LuceneEventIndex Failed to perform background > maintenance procedures > java.io.IOException: Invalid Boolean value found when reading 'Repetition' of > field 'Source System FlowFile Identifier'. Expected 0 or 1 but got 145 > at > [org.apache.nifi|http://org.apache.nifi/].repository.schema.SchemaRecordReader.readField(SchemaRecordReader.java:107) > at > [org.apache.nifi|http://org.apache.nifi/].repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:72) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.EventIdFirstSchemaRecordReader.readRecord(EventIdFirstSchemaRecordReader.java:138) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.EventIdFirstSchemaRecordReader.nextRecord(EventIdFirstSchemaRecordReader.java:132) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.serialization.CompressableRecordReader.nextRecord(CompressableRecordReader.java:287) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.store.iterator.SequentialRecordReaderEventIterator.nextEvent(SequentialRecordReaderEventIterator.java:73) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.store.iterator.AuthorizingEventIterator.nextEvent(AuthorizingEventIterator.java:47) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.store.PartitionedEventStore.getEvents(PartitionedEventStore.java:214) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.store.PartitionedEventStore.getEvents(PartitionedEventStore.java:158) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.store.PartitionedEventStore.getEvents(PartitionedEventStore.java:148) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.index.lucene.LuceneEventIndex.performMaintenance(LuceneEventIndex.java:650) > at > [org.apache.nifi|http://org.apache.nifi/].provenance.index.lucene.LuceneEventIndex.lambda$initialize$0(LuceneEventIndex.java:156) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)