Guys,
We are moving in tangent here. The question was what is the easiest way of
finding key words in a string column as in transactiondescription?
I am aware of functions like regexp, instr, patindex etc.
How in general this is done and not necessarily in Spark?
For example. A naive question.
I agree with you.
> On 03 Aug 2016, at 01:20, ayan guha wrote:
>
> I would stay away from transaction tables until they are fully baked. I do
> not see why you need to update vs keep inserting with timestamp and while
> joining derive latest value on the fly.
>
> But I
Phoenix will become another standard query interface of hbase. I do not agree
that using hbase directly will lead to a faster performance. It always depends
how you use it. While it is another component, it can make sense to use it.
This has to be evaluated on a case by case basis.
If you only
I would stay away from transaction tables until they are fully baked. I do
not see why you need to update vs keep inserting with timestamp and while
joining derive latest value on the fly.
But I guess it has became a religious question now :) and I am not
unbiased.
On 3 Aug 2016 08:51, "Mich
+1
> On Aug 2, 2016, at 2:29 PM, Jörn Franke wrote:
>
> If you need to use single inserts, updates, deletes, select why not use hbase
> with Phoenix? I see it as complementary to the hive / warehouse offering
>
>> On 02 Aug 2016, at 22:34, Mich Talebzadeh
Hi,
I decided to create a catalog table in Hive ORC and transactional. That
table has two columns of value
1. transactiondescription === account_table.transactiondescription
2. hashtag String column created from a semi automated process of
deriving it from
Hi Mich,
It seems like an entity resolution problem - looking at different
representations of an entity - SAINSBURY in this case and matching them all
together. How dirty is your data in the description - are there stop words
like SACAT/SMKT etc you can strip off and get the base retailer entity
Thanks.
I believe there is some catalog of companies that I can get and store it in
a table and math the company name to transactiondesciption column.
That catalog should have sectors in it. For example company XYZ is under
Grocers etc which will make search and grouping much easier.
I believe
Well, if you still want to use windows function for your logic, then you need
to derive a new column out, like "catalog", and use it as part of grouping
logic.
Maybe you can use regex for deriving out this new column. The implementation
needs to depend on your data in