Adam, If you believe that a process in the flow is manipulating the characters you can use the built in provenance, archive, and data viewer functions. We need to document how to set this stuff up. But for now if you configure the nifi.properties as follows and restart you'll have the good stuff. This is all assuming you're on the latest develop branch codebase:
Set the following properties to the following values (these are just examples): nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=10 MB nifi.content.claim.max.flow.files=100 nifi.content.repository.directory.pub1=/your/path/for/content nifi.content.repository.archive.max.retention.period=3 hours nifi.content.repository.archive.max.usage.percentage=30% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false nifi.content.viewer.url=/nifi-content-viewer/ nifi.provenance.repository.directory.prov1=/your/path/for/prov nifi.provenance.repository.max.storage.time=24 hours nifi.provenance.repository.max.storage.size=1 GB nifi.provenance.repository.rollover.time=30 secs nifi.provenance.repository.rollover.size=100 MB nifi.provenance.repository.query.threads=6 nifi.provenance.repository.compress.on.rollover=true nifi.provenance.repository.always.sync=false nifi.provenance.repository.journal.count=16 # Comma-separated list of fields. Fields that are not indexed will not be searchable. Valid fields are: # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, ContentType, Relationship, Details nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, ProcessorID # FlowFile Attributes that should be indexed and made searchable nifi.provenance.repository.indexed.attributes= # Large values for the shard size will result in more Java heap usage when searching the Provenance Repository # but should provide better performance nifi.provenance.repository.index.shard.size=500 MB Basically the things different from default here would be: nifi.content.viewer.url=/nifi-content-viewer/ nifi.content.repository.archive.max.retention.period=3 hours nifi.content.repository.archive.max.usage.percentage=30% nifi.content.repository.archive.enabled=true Anyway what this does is tells nifi to hang onto the content until it has to actually delete it from disk. It then allows you to look at the provenance trail of any object and then you can 'view content' in our built-in content viewer. You can use that to step by step review the content as it goes through the flow. We must make a nice blog out of this with screenshots. It is a really powerful feature. If that doesn't get you the info you need please let us know. Thanks Joe On Thu, Apr 30, 2015 at 2:20 PM, Adam Estrada <[email protected]> wrote: > All, > > I am coming across an issue where my unicode characters are being converted > to their unicode point representations (as javascript escapes) like this > "\u0432\u0430\u0436\u043d\u0435\u0435". This is happening with Twitter data > that is collected using the Twitter processor. How can I debug my workflow > to figure out where the characters are being converted? > > Thanks, > Adam
