could someone guide on this one Regards Manik Singla +91-9996008893 +91-9665639677
"Life doesn't consist in holding good cards but playing those you hold well." On Tue, Jun 11, 2019 at 5:58 PM Manik Singla <smanik...@gmail.com> wrote: > Hey Team > > I have started using parquet recently. > > Kind of data I save is something like > > *raw hostname cluster serviceName * > > where raw is actual log lines. > > For raw, dictionary doesn't work as we no 2 log lines are same. But if we > tokenise terms in dictionary, then dictionary can help here to filter out > unwanted rows. For example, parquet is a columnar format will become > "parquet", "is", "a", "columnar", "format". > > Also, I see mention of merging bloomfilter not sure if we considering > tokenisation there. > > Do we support some out of box to way to tokenise text before dictionary > > Also, what are your views if we think to add it > > Regards > Manik Singla > +91-9996008893 > +91-9665639677 > > "Life doesn't consist in holding good cards but playing those you hold > well." >