kasakrisz opened a new pull request #6: AMBARI-24761 - Infra Manager: hive 
support for archiving Infra Solr
URL: https://github.com/apache/ambari-infra/pull/6
 
 
   ## What changes were proposed in this pull request?
   
   - When archiving documents stored in solr collections the output json file 
is compressed. Change output file compressor from tar.gz to bzip2 because Hive 
can not process tar.gz
   - When serializing Documents to json the integer type fields should be 
serialized as integers not as strings.
   Instead of 
   ```
   "line_number":"315",
   ```
   use
   ```
   "line_number":315,
   ```
   because these fields are declared as integers in the target Hive table and 
Hives's `org.apache.hive.hcatalog.data.JsonSerDe` serializer expects integers.
   - adjust UTs and ITs
   
   ## How was this patch tested?
   
   1. Run UTs and ITs
   2. Manually:
   - Deploy Ambari and a cluster including Infra Solr, Infra Manager, 
Logsearch, Ranger, Hive, Hdfs
   - Enable Ranger plugins
   - Create folders on HDFS to store exported data and set permissions to allow 
reading from the folders for Hive and write for Infra Manager 
   - Export data from Solr using Infra manager to HDFS
   - Create external tables in Hive for exported data 
(https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/amb_infra_arch_n_purge_command_line_operations.html)
   - Select data from the tables using Hive Query  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to