uuf6429 opened a new issue, #13784:
URL: https://github.com/apache/druid/issues/13784
I'm completely new to Druid and I'm evaluating it as a DW+Analytics solution
(using the docker-compose sample from this repo).
Unfortunately, I've had some problems working with it. I think some of them
might be intentional, but maybe not so clear to a first time user? Here goes:
1. If data is not loaded successfully after ingestion (e.g. records do not
match the preconfigured task regex), the _ingestion itself is marked as
successful_, but _nothing is imported_. To me this is a problem for these
reasons:
a. this effectively means data might be / is being lost
b. this is happening without warning
c. couldn't find any logging related to this (the logs panel was not
useful tracking this problem)
2. When I had two ingestion tasks pointing to the same datasource, the last
one ended up overwriting data from the previous one. I still can't figure out
the exact reason, but:
a. as a data platform, data should be sacred 😄 - it should triple check
with the user if data is overwritten or dropped automatically
b. while investigating, I came up to the "segment granularity" setting,
which is I assume relates to my problem - again, if it is potentially removing
data, it should be visually prominent
c. in my opinion, the UX should be geared to a more cautious approach -
ie, it's easier to delete data than getting it back, so maybe it's better to
have defaults that ensure data is retained
3. Not sure if this is a bug. I had some segments with a "0" in the "Num
rows" column, but the "Records" tab in the segments modal actually showed a
record.
4. I don't know what I did but at one point, while ingestion still worked,
datasources were all marked as "unavailable" (yellow circle) and I had to
restart all the containers to get it unstuck.
In the end, my current objective is to get a statistics from a bunch of urls
(each url returns a flat json object).
The tasks for this were being triggered by NiFi (via Druid POST API).
Speaking in DBMS terms, I'd like a row for each url in the same stats table
(datasource in Druid terms, right?).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]