Hi Ali,
We have recently faced some data sources that generate data in a nested
format. For example, AWS Cloudtrail generates data in the following
JSON
format:
{
"Records": [
{
"eventVersion": *"2.0"*,
"userIdentity": {
"type": *"IAMUser"*,
"principalId": *"EX_PRINCIPAL_ID"*,
"arn": *"arn:aws:iam::123456789012:user/Alice"*,
"accessKeyId": *"EXAMPLE_KEY_ID"*,
"accountId": *"123456789012"*,
"userName": *"Alice"*
},
"eventTime": *"2014-03-07T21:22:54Z"*,
"eventSource": *"ec2.amazonaws.com <http://ec2.amazonaws.com>"*,
"eventName": *"StartInstances"*,
"awsRegion": *"us-east-2"*,
"sourceIPAddress": *"205.251.233.176"*,
"userAgent": *"ec2-api-tools 1.6.12.2"*,
"requestParameters": {
"instancesSet": {
"items": [
{
"instanceId": *"i-ebeaf9e2"*
}
]
}
},
"responseElements": {
"instancesSet": {
"items": [
{
"instanceId": *"i-ebeaf9e2"*,
"currentState": {
"code": 0,
"name": *"pending"*
},
"previousState": {
"code": 80,
"name": *"stopped"*
}
}
]
}
}
}
]
}
We are able to make this as a flat JSON file. However, a nested object
is
supported by data backends in Metron (ES, ORC, etc.), so I was
wondering
whether with the current version of Metron we are able to index nested
documents or we have to make it flat?
We parse the same CloudTrail data. The way we parse this is first of
all, we have Apache NiFi running which extracts the individual events
from the records. Second, make sure that you use set mapStrategy to
UNFOLD in your JSON Parser.