kasakrisz opened a new pull request #6: AMBARI-24761 - Infra Manager: hive support for archiving Infra Solr URL: https://github.com/apache/ambari-infra/pull/6 ## What changes were proposed in this pull request? - When archiving documents stored in solr collections the output json file is compressed. Change output file compressor from tar.gz to bzip2 because Hive can not process tar.gz - When serializing Documents to json the integer type fields should be serialized as integers not as strings. Instead of ``` "line_number":"315", ``` use ``` "line_number":315, ``` because these fields are declared as integers in the target Hive table and Hives's `org.apache.hive.hcatalog.data.JsonSerDe` serializer expects integers. - adjust UTs and ITs ## How was this patch tested? 1. Run UTs and ITs 2. Manually: - Deploy Ambari and a cluster including Infra Solr, Infra Manager, Logsearch, Ranger, Hive, Hdfs - Enable Ranger plugins - Create folders on HDFS to store exported data and set permissions to allow reading from the folders for Hive and write for Infra Manager - Export data from Solr using Infra manager to HDFS - Create external tables in Hive for exported data (https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/amb_infra_arch_n_purge_command_line_operations.html) - Select data from the tables using Hive Query
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services