[GitHub] [druid] uuf6429 opened a new issue, #13784: Usability problems

via GitHub Thu, 09 Feb 2023 10:51:15 -0800


uuf6429 opened a new issue, #13784:
URL: https://github.com/apache/druid/issues/13784


   I'm completely new to Druid and I'm evaluating it as a DW+Analytics solution 
(using the docker-compose sample from this repo).
   
   Unfortunately, I've had some problems working with it. I think some of them 
might be intentional, but maybe not so clear to a first time user? Here goes:
   
   1. If data is not loaded successfully after ingestion (e.g. records do not 
match the preconfigured task regex), the _ingestion itself is marked as 
successful_, but _nothing is imported_. To me this is a problem for these 
reasons:
       a. this effectively means data might be / is being lost
       b. this is happening without warning
       c. couldn't find any logging related to this (the logs panel was not 
useful tracking this problem)
   2. When I had two ingestion tasks pointing to the same datasource, the last 
one ended up overwriting data from the previous one. I still can't figure out 
the exact reason, but:
       a. as a data platform, data should be sacred 😄 - it should triple check 
with the user if data is overwritten or dropped automatically
       b. while investigating, I came up to the "segment granularity" setting, 
which is I assume relates to my problem - again, if it is potentially removing 
data, it should be visually prominent
       c. in my opinion, the UX should be geared to a more cautious approach - 
ie, it's easier to delete data than getting it back, so maybe it's better to 
have defaults that ensure data is retained
   3. Not sure if this is a bug. I had some segments with a "0" in the "Num 
rows" column, but the "Records" tab in the segments modal actually showed a 
record.
   4. I don't know what I did but at one point, while ingestion still worked, 
datasources were all marked as "unavailable" (yellow circle) and I had to 
restart all the containers to get it unstuck.
   
   
   
   In the end, my current objective is to get a statistics from a bunch of urls 
(each url returns a flat json object).
   The tasks for this were being triggered by NiFi (via Druid POST API). 
Speaking in DBMS terms, I'd like a row for each url in the same stats table 
(datasource in Druid terms, right?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] uuf6429 opened a new issue, #13784: Usability problems

Reply via email to