The performance of SQLite doesn't matter is it is restricted to a single worker 
anyway -- it's definitely not recommended for running in production.

-ash

> On 16 Aug 2018, at 08:33, George Leslie-Waksman <waks...@gmail.com> wrote:
> 
> These performance characteristics are metadata database backend dependent
> as well. If there are benchmarks, I would hope we look at them across
> sqlite, mysql, postgresql, and any other supported backends before we take
> action.
> 
> On Thu, Aug 9, 2018 at 12:41 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> The change on perf for the DAG table would be extremely negligible.
>> 
>> Maybe for task_instances (large table with millions of rows, 3 fields
>> composite key) it *could* be a decent idea. Though you'd then need to have
>> two indexes to store and maintain and we may have to change the code to
>> actually use and reference that new more efficient pk in places where it's
>> more efficient to use that index (some of it SQLAlchemy would do right out
>> of the box).
>> 
>> This mostly affects the index size (btree(id) is much smaller than
>> btree(dag_id, task_id, execution_date)), not the key lookup time much as it
>> is log(n). We'd still have to use the composite btree when we want to do
>> range scans, which we use frequently to get sets of tasks for a dag or
>> specific dag task. Since lookups are log(n), and that we need to maintain
>> that composite btree anyways for range scans, I don't see where that would
>> really help. It would be a better index (less pages, less memory usage,
>> ...) if we didn't need that other composite one, which we do.
>> 
>> Max
>> 
>> On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta <vardangupta...@gmail.com>
>> wrote:
>> 
>>> Point well taken on backward compatibility, we will have to take this
>>> change very diligently, if implemented.
>>> 
>>> On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова <xnuins...@gmail.com> wrote:
>>> 
>>>> Because in case what you described nothing about backward
>> compatibility.
>>>> You think what all who use need to change all theirs DAG's? It's very
>>>> strange, because you propose one of the most critical change and it
>> will
>>>> side everyone. If you want id - call it dag_metadata_id and add it. But
>>> not
>>>> propose change what hasn't backward compatibility. It's to strange.
>>>> 
>>>> On Thu, Aug 9, 2018 at 7:04 AM vardangupta...@gmail.com <
>>>> vardangupta...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On 2018/08/09 11:55:11, Ash Berlin-Taylor <a...@apache.org> wrote:
>>>>>> Absolutely - there will still need to be a human-readable DAG id,
>>> even
>>>>> we end up with an auto-icrementing integer ID column internally and
>> for
>>>>> table join performance reasons.
>>>>>> 
>>>>>> -ash
>>>>>> 
>>>>>>> On 9 Aug 2018, at 12:35, Юли Волкова <xnuins...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> How will you understand what your DAG 00002 doing enter to it?
>> For
>>>>> each of
>>>>>>> 100, for example?
>>>>>>> Especially, if you are not a developer, who create it. You are a
>>>>> support
>>>>>>> team and have 120 DAGs.
>>>>>>> 
>>>>>>> The first time, when want to also send the answer to dev-mail
>> list.
>>>>> Please,
>>>>>>> don't do it.
>>>>>>> 
>>>>>>> I think it's will be really bad to all who use dag_id like a
>> saying
>>>>> name of
>>>>>>> dag. If I will be looked at 0329313 this does not say anything
>>> useful
>>>>> for
>>>>>>> me and it will be very very complicated to identify for which
>>> process
>>>>> dag
>>>>>>> using.  It could be another id for the indexes in DB if it's real
>>>>> problem
>>>>>>> for somebody. But, please, do not change dag_id.
>>>>>>> 
>>>>>>> On Mon, Aug 6, 2018 at 1:32 AM vardangupta...@gmail.com <
>>>>>>> vardangupta...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Everyone,
>>>>>>>> 
>>>>>>>> Do we have any plan to change type of dag_id from String to
>>> Number,
>>>>> this
>>>>>>>> will make queries on metadata more performant, proposal could be
>>>>> generating
>>>>>>>> an auto-incremental value in dag table and this id getting used
>> in
>>>>> rest of
>>>>>>>> the other tables?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Vardan Gupta
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> _________
>>>>>>> 
>>>>>>> С уважением, Юлия Волкова
>>>>>>> Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82>
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Thanks Ash for your reply, I am aligned with what you're saying.
>>>>> 
>>>>> I was not proposing to take away human readable dag_id instead I was
>>>>> thinking, why can't we create another field like dag_name which will
>>> hold
>>>>> this information at all front facing sites while dag_id is changed to
>>>>> integer, this will help in making joins work faster in metastore.
>>> Though,
>>>>> currently dag_id is indexed but still indexing int (4 bytes) vs
>>>>> varchar(250) are going to take more index blocks and therefore more
>>> look
>>>> up
>>>>> time. Also, if dag_id is not trivial to change to int, let it be
>>> present
>>>>> and let's introduce another col which is actually integer in type and
>>> let
>>>>> joining happen on this column across all tables.
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> _________
>>>> 
>>>> С уважением, Юлия Волкова
>>>> Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82>
>>> 
>> 

Reply via email to