Re: Extracting key word from a textual column

2016-08-03 Thread Mich Talebzadeh
Guys, We are moving in tangent here. The question was what is the easiest way of finding key words in a string column as in transactiondescription? I am aware of functions like regexp, instr, patindex etc. How in general this is done and not necessarily in Spark? For example. A naive question.

Re: Extracting key word from a textual column

2016-08-02 Thread Jörn Franke
I agree with you. > On 03 Aug 2016, at 01:20, ayan guha wrote: > > I would stay away from transaction tables until they are fully baked. I do > not see why you need to update vs keep inserting with timestamp and while > joining derive latest value on the fly. > > But I

Re: Extracting key word from a textual column

2016-08-02 Thread Jörn Franke
Phoenix will become another standard query interface of hbase. I do not agree that using hbase directly will lead to a faster performance. It always depends how you use it. While it is another component, it can make sense to use it. This has to be evaluated on a case by case basis. If you only

Re: Extracting key word from a textual column

2016-08-02 Thread ayan guha
I would stay away from transaction tables until they are fully baked. I do not see why you need to update vs keep inserting with timestamp and while joining derive latest value on the fly. But I guess it has became a religious question now :) and I am not unbiased. On 3 Aug 2016 08:51, "Mich

Re: Extracting key word from a textual column

2016-08-02 Thread Ted Yu
+1 > On Aug 2, 2016, at 2:29 PM, Jörn Franke wrote: > > If you need to use single inserts, updates, deletes, select why not use hbase > with Phoenix? I see it as complementary to the hive / warehouse offering > >> On 02 Aug 2016, at 22:34, Mich Talebzadeh

Re: Extracting key word from a textual column

2016-08-02 Thread Mich Talebzadeh
Hi, I decided to create a catalog table in Hive ORC and transactional. That table has two columns of value 1. transactiondescription === account_table.transactiondescription 2. hashtag String column created from a semi automated process of deriving it from

Re: Extracting key word from a textual column

2016-08-02 Thread Sonal Goyal
Hi Mich, It seems like an entity resolution problem - looking at different representations of an entity - SAINSBURY in this case and matching them all together. How dirty is your data in the description - are there stop words like SACAT/SMKT etc you can strip off and get the base retailer entity

Re: Extracting key word from a textual column

2016-08-02 Thread Mich Talebzadeh
Thanks. I believe there is some catalog of companies that I can get and store it in a table and math the company name to transactiondesciption column. That catalog should have sectors in it. For example company XYZ is under Grocers etc which will make search and grouping much easier. I believe

Re: Extracting key word from a textual column

2016-08-02 Thread Yong Zhang
Well, if you still want to use windows function for your logic, then you need to derive a new column out, like "catalog", and use it as part of grouping logic. Maybe you can use regex for deriving out this new column. The implementation needs to depend on your data in