
I have a DataFrame that represents my data looks like this:

| col_name    |         data_type          |
| obj_id      | string                     |
| type        | string                     |
| name        | string                     |
| metric_name | string                     |
| value       | double                     |
| ts          | timestamp                  |

It is working fine, and I can store it to parquet with:


I would like to leverage parquet partitioning as referenced here,

I would like to see a representation something like this:

|__ data
      |__ metrics
            |__ type=Virtual Machine
                  |__ objId=1234
                        |__ metricName=CPU Demand
                              |__ yyyymmdd
                                    |__ data.parquet
                        |__ metricName=CPU Utilization
                              |__ yyyymmdd
                                    |__ data.parquet
                  |__ objId=5678
                        |__ metricName=CPU Demand
                              |__ yyyymmdd
                                    |__ data.parquet
            |__ type=Application
                  |__ objId=0009
                        |__ metricName=Response Time
                              |__ yyyymmdd
                                    |__ data.parquet
                        |__ metricName=Slow Response
                              |__ yyyymmdd
                                    |__ data.parquet
                  |__ objId=0303
                        |__ metricName=Response Time
                              |__ yyyymmdd
                                    |__ data.parquet

What is the correct way to achieve this? I can do something like:

df.map{  case Row(nodeType: String, objId: String, name: String,
metricName: String, value: Double, ts: java.sql.Timestamp) =>

   // construct path
   val path = 
   // save record as parquet
   df.saveAsParquet(path, Row)

Is this the right approach or is there a more optimal approach?  This would save
every row as an individual file.  I will receive multiple entries for a
given metric, type and objId combination in a given day.

TIA for the assistance.


Reply via email to