[ 
https://issues.apache.org/jira/browse/AMBARI-24761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648803#comment-16648803
 ] 

ASF GitHub Bot commented on AMBARI-24761:
-----------------------------------------

kasakrisz opened a new pull request #6: AMBARI-24761 - Infra Manager: hive 
support for archiving Infra Solr
URL: https://github.com/apache/ambari-infra/pull/6
 
 
   ## What changes were proposed in this pull request?
   
   - When archiving documents stored in solr collections the output json file 
is compressed. Change output file compressor from tar.gz to bzip2 because Hive 
can not process tar.gz
   - When serializing Documents to json the integer type fields should be 
serialized as integers not as strings.
   Instead of 
   ```
   "line_number":"315",
   ```
   use
   ```
   "line_number":315,
   ```
   because these fields are declared as integers in the target Hive table and 
Hives's `org.apache.hive.hcatalog.data.JsonSerDe` serializer expects integers.
   - adjust UTs and ITs
   
   ## How was this patch tested?
   
   1. Run UTs and ITs
   2. Manually:
   - Deploy Ambari and a cluster including Infra Solr, Infra Manager, 
Logsearch, Ranger, Hive, Hdfs
   - Enable Ranger plugins
   - Create folders on HDFS to store exported data and set permissions to allow 
reading from the folders for Hive and write for Infra Manager 
   - Export data from Solr using Infra manager to HDFS
   - Create external tables in Hive for exported data 
(https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.2.2/bk_ambari-operations/content/amb_infra_arch_n_purge_command_line_operations.html)
   - Select data from the tables using Hive Query  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Infra Manager: hive support for archiving Infra Solr
> ----------------------------------------------------
>
>                 Key: AMBARI-24761
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24761
>             Project: Ambari
>          Issue Type: Bug
>          Components: infra
>    Affects Versions: 2.8.0
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.8.0
>
>
> When exporting Solr documents from logsearch and ranger collections save it 
> to a format which can be parsed by Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to