voonhous commented on issue #17744:
URL: https://github.com/apache/hudi/issues/17744#issuecomment-3822180385

   How to obtain raw variant and metadata bytes:
   
   Run the following pyspark scripts in - This will be STEP :
   https://github.com/apache/hudi/issues/17744#issuecomment-3822090390
   https://github.com/apache/hudi/issues/17744#issuecomment-3710064394
   
   
   Install parquet-tools in your virtualenv
   ```shell
   pip3 install parquet-tools
   ```
   
   
   Then modify the 
`venv/lib/python-version/site-packages/parquet_tools/commands/show.py`
   
   Replace everything from `_execute` function with the code below:
   
   
   ```python
   def _execute(df: pd.DataFrame, format: str, head: int, columns: list) -> 
None:
       # head
       df_head: pd.DataFrame = df.head(head) if head > 0 else df
       # select columns
       df_select: pd.DataFrame = df_head[columns] if len(columns) else df_head
       df_hex = df_select.applymap(to_hex_recursive)
       print(tabulate(df_hex, df_select.columns, tablefmt=format, 
showindex=False))
       print(tabulate(df_select, df_select.columns, tablefmt=format, 
showindex=False))
   
   def to_hex_recursive(val):
       if val is None:
           return None
   
       if isinstance(val, (bytes, bytearray)):
           return bytes_to_hex_commas(val)
   
       if isinstance(val, dict):
           return {k: to_hex_recursive(v) for k, v in val.items()}
   
       if isinstance(val, (list, tuple)):
           return type(val)(to_hex_recursive(v) for v in val)
   
       return val
   
   def bytes_to_hex_commas(b: bytes) -> str:
       return ",".join(f"0x{x:02x}" for x in b)
   ```
   
   
   Then run: 
   
   `parquet-tools show output-to-parquet-STEP 1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to