I'd like to propose that all non-deprecated (or likely to be deprecated)
Get/Fetch/Query processors get a standard convention for attributes that
describe things like:

1. Source system.
2. Database/table/index/collection/etc.
3. The lookup criteria that was used (similar to the "query attribute" some
already have).

Using GetMongo as an example, it would add something like this:

source.url=mongodb://localhost:27017
source.database=testdb
source.collection=test_collection
source.query={ "username": "john.smith" }
source.criteria.username=john.smith //GetMongo would parse the query and
add this.

We have a use case where a team is coming from an extremely batch-oriented
view and really wants to know when "dataset X" was run. Our solution was to
extract that from the result set because the dataset name is one of the
fields in the JSON body.

I think this would help expand what you can do out of the box with
provenance tracking because it would provide a lot of useful information
that could be stored in Solr or ES and then queried against terminating
processors' DROP events to get a solid window into when jobs were run
historically.

Thoughts?

Reply via email to