Profiling limitation in the Docker image + additional sources beyond Hive

Enrico D'Urso Fri, 13 Apr 2018 02:30:08 -0700

Hi,

I am running Griffin on the Docker image, but on the profiling tab on the UI I 
see that the options are limited to:


  *   Simple Statistics
     *   Null count
     *   Distinct count
  *   Summary statistics
     *   Total Count


  *   Advanced statistics
     *   Enum detection TOP 5 count

There is a particular reason why I cannot select “Regular Expression Match” in 
the Advanced statistics? It is a good feature to test.
Am I missing something or that feature is not available in the Docker image?

Also, in the measure tab I see that I can select a source and a destination 
table to check if they are equal (for instance), there is any
plan in the future to include checking between different data sources? For 
example, between Hive table and RedShift tables?
As you are using Spark as computation engine, it should not be so hard to 
implement as Spark as connectors for both systems.


Thanks,

Enrico

Profiling limitation in the Docker image + additional sources beyond Hive

Reply via email to