For completeness sake, here is the needed JOLT magic to flatten the data in
the way I was aiming for:

[
  {  //this operation just tags a name to the record
    "operation": "shift",
    "spec": {
      "*": "record.&"
    }
  },
  {  //this operation does the actual flattening (but needs the outer tag
to anchor the work; hence the previous operation)
    "operation": "shift",
    "spec": {
      "record": {
        "*":
          "$": "TValue[#2].Name",
          "@": "TValue[#2].Value"
        }
      }
    }
  },
  { //this operation just adds a pre-formatted key/value onto each item
    "operation": "default",
    "spec": {
      "TValue[]": {
        "*": {
          "class": "unclass"
        }
      }
    }
  }
]

CSV Data:
>
Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
> 1,05-02-2010,1643690.9,0,42.31,2.572,211.0963582,8.106
> 1,12-02-2010,1641957.44,1,38.51,2.548,211.2421698,8.106
> 1,19-02-2010,1611968.17,0,39.93,2.514,211.2891429,8.106
> 1,26-02-2010,1409727.59,0,46.63,2.561,211.3196429,8.106

Converted into Bare JSON:
{"Store":"1","Date":"05-02-2010","Weekly_Sales":"1643690.9","Holiday_Flag":"0","Temperature":"42.31","Fuel_Price":"2.572","CPI":"211.0963582","Unemployment":"8.106"}
{"Store":"1","Date":"12-02-2010","Weekly_Sales":"1641957.44","Holiday_Flag":"1","Temperature":"38.51","Fuel_Price":"2.548","CPI":"211.2421698","Unemployment":"8.106"}
{"Store":"1","Date":"19-02-2010","Weekly_Sales":"1611968.17","Holiday_Flag":"0","Temperature":"39.93","Fuel_Price":"2.514","CPI":"211.2891429","Unemployment":"8.106"}
{"Store":"1","Date":"26-02-2010","Weekly_Sales":"1409727.59","Holiday_Flag":"0","Temperature":"46.63","Fuel_Price":"2.561","CPI":"211.3196429","Unemployment":"8.106"}

and the result of the JOLT processor with the above operations applied:
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"05-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1643690.9","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"42.31","class":"unclass"},{"name":"Fuel_Price","value":"2.572","class":"unclass"},{"name":"CPI","value":"211.096358
2","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"12-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1641957.44","class":"unclass"},{"name":"Holiday_Flag","value":"1","class":"unclass"},{"name":"Temperature","value":"38.51","class":"unclass"},{"name":"Fuel_Price","value":"2.548","class":"unclass"},{"name":"CPI","value":"211.24216
98","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"19-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1611968.17","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"39.93","class":"unclass"},{"name":"Fuel_Price","value":"2.514","class":"unclass"},{"name":"CPI","value":"211.28914
29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}
{"TValue":[{"name":"Store","value":"1","class":"unclass"},{"name":"Date","value":"26-02-2010","class":"unclass"},{"name":"Weekly_Sales","value":"1409727.59","class":"unclass"},{"name":"Holiday_Flag","value":"0","class":"unclass"},{"name":"Temperature","value":"46.63","class":"unclass"},{"name":"Fuel_Price","value":"2.561","class":"unclass"},{"name":"CPI","value":"211.31964
29","class":"unclass"},{"name":"Unemployment","value":"8.106","class":"unclass"}]}

To be fair, I'm really going to dump the data out as AVRO...but a) I didn't
see much of a question about how to do that (just config the JOLT processor
accordingly) and b) that's not nearly as readable.

hth,

mew

Reply via email to