Santosh,

The flow that you've outlined there seems reasonable, but it is certainly 
better if you don't have to split the data up, both in terms of performance as 
well as in terms of making the flow easier to design. I would imagine that 
PutMongoRecord missing the Upsert mode is simply an oversight and can be 
updated. If you're inclined to take a stab at that, that would likely yield you 
the best results.

Barring that, you may also want to try using SelectHiveQL -> SplitRecord -> 
PutMongo instead of using ConvertRecord + SplitJson. SplitRecord is capable of 
reading the data in any incoming format and then writing it out in any format, 
so it basically handles the job of both Convert and Split, but it does so much 
more efficiently. So that would certainly make your dataflow more efficient, 
but I don't know if that would give you the performance gain that you need, 
given that PutMongo still would be putting individual records.

Thanks
-Mark


> On Jun 11, 2019, at 1:41 AM, Santosh Pawar <[email protected]> wrote:
> 
> Hi Team,
> 
> I am ingesting data from hive to mongodb using flow as mentioned below :
> 
> SelectHiveQL - > ConvertRecord ->  SplitJson -> PutMongo
> 
> Is there any way to ingest data from hive to mongodb using nifi because I
> used two processor to push data into mongodb PutMongoRecord and PutMongo
> Proccessor but there are limitation with processors given below :
> 
> 1. PutMongoRecord Processor : upsert mode not avaibalble
> 
> 2. PutMongo Processor : There is upsert mode but it pushes single object at
> a time
> 
> Question : Is there any way to insert ArrayJson directly into MongoDB,
> because in current flow there is  performance issue which is happening &
> becoming a bottleneck as number of files which are getting processed are
> now becoming very large since array of json can not be inserted & splitjson
> processor is used as overhead to split each array object from these files.
> 
> Thanks,
> Santosh Pawar

Reply via email to