[ 
https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866320#comment-15866320
 ] 

ASF GitHub Bot commented on NIFI-3356:
--------------------------------------

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1493#discussion_r101104283
  
    --- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/claim/ContentClaimWriteCache.java
 ---
    @@ -0,0 +1,168 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.controller.repository.claim;
    +
    +import java.io.BufferedOutputStream;
    +import java.io.IOException;
    +import java.io.OutputStream;
    +import java.util.HashMap;
    +import java.util.LinkedList;
    +import java.util.Map;
    +import java.util.Queue;
    +
    +import org.apache.nifi.controller.repository.ContentRepository;
    +import org.apache.nifi.stream.io.ByteCountingOutputStream;
    +
    +public class ContentClaimWriteCache {
    +    private final ContentRepository contentRepo;
    +    private final Map<ResourceClaim, ByteCountingOutputStream> streamMap = 
new HashMap<>();
    +    private final Queue<ContentClaim> queue = new LinkedList<>();
    +    private final int bufferSize;
    +
    +    public ContentClaimWriteCache(final ContentRepository contentRepo) {
    +        this(contentRepo, 8192);
    +    }
    +
    +    public ContentClaimWriteCache(final ContentRepository contentRepo, 
final int bufferSize) {
    +        this.contentRepo = contentRepo;
    +        this.bufferSize = bufferSize;
    +    }
    +
    +    public void reset() throws IOException {
    +        try {
    +            forEachStream(OutputStream::close);
    +        } finally {
    +            streamMap.clear();
    +            queue.clear();
    +        }
    +    }
    +
    +    public ContentClaim getContentClaim() throws IOException {
    +        final ContentClaim contentClaim = queue.poll();
    +        if (contentClaim != null) {
    +            contentRepo.incrementClaimaintCount(contentClaim);
    +            return contentClaim;
    +        }
    +
    +        final ContentClaim claim = contentRepo.create(false);
    +        registerStream(claim);
    +        return claim;
    +    }
    +
    +    private ByteCountingOutputStream registerStream(final ContentClaim 
contentClaim) throws IOException {
    --- End diff --
    
    Another good catch - we can get rid of the ByteCountingOutputStream. Must 
have done some refactoring so that I didn't need it, and then left it. Will 
address.


> Provide a newly refactored provenance repository
> ------------------------------------------------
>
>                 Key: NIFI-3356
>                 URL: https://issues.apache.org/jira/browse/NIFI-3356
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different 
> times over several years. The original design for the repository was to 
> provide storage of events and sequential iteration over those events via a 
> Reporting Task. After that, we added the ability to compress the data so that 
> it could be held longer. We then introduced the notion of indexing and 
> searching via Lucene. We've since made several more modifications to try to 
> boost performance.
> At this point, however, the repository is still the bottleneck for many flows 
> that handle large volumes of small FlowFiles. We need a new implementation 
> that is based around the current goals for the repository and that can 
> provide better throughput.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to