Re: Bringing automatic compaction into trunk

Jan Lehnardt Tue, 16 Aug 2011 07:07:08 -0700

On Aug 16, 2011, at 4:00 PM, Robert Newson wrote:

> Ok, let's see Pauls' code concerns addressed first, it needs that
> cleanup before it can hit trunk.
> 
> I'd still prefer to see an event-driven rather than polling approach,
> e.g, hook into update_notifier and build a queue of databases that are
> actively being written to (and therefore growing). A much lazier
> background thing could compact databases that are inactive.


Jup, my discussion was barring that all that is sorted out as an
"implementation detail". Back to JIRA.

Cheers
Jan
-- 

> 
> B.
> 
> On 16 August 2011 14:48, Jan Lehnardt <[email protected]> wrote:
>> 
>> On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
>> 
>>> All good points Jan, thanks.
>>> 
>>> Having large numbers of databases is one thing, but I'm focused on the
>>> impact on ongoing operations with this running in the background. What
>>> does it do to the users experience to have all dbs scanned
>>> periodically, etc?
>>> 
>>> The reason I suggest doing it after the move, and in its own app, is
>>> to reduce the work needed to not use this code in some circumstances
>>> (Cloudant hosting, for example). Yes, it's a separate module and
>>> disabled by default, but putting it in its own application will make
>>> the separation much more explicit and preclude unintended
>>> entanglements with core over time.
>> 
>> I think this is a valid concern, but I don't think it outweighs the
>> disadvantage. I'm happy to spend time to make sure this is properly
>> modular after srcmv.
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
>>> 
>>> B.
>>> 
>>> On 16 August 2011 14:31, Jan Lehnardt <[email protected]> wrote:
>>>> 
>>>> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>>>> 
>>>>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>>>>> it will be problematic in precisely the circumstance when you'd most
>>>>> want auto compaction (large numbers of databases and views).
>>>> 
>>>> As Filipe mentions in the ticket, this was tested with large numbers of
>>>> databases.
>>>> 
>>>> In addition, your "most want" assumption doesn't hold for the average
>>>> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
>>>> plus that a software doesn't start wasting a system resource without
>>>> cleaning up after itself. But this isn't even suggesting to enable this by
>>>> default. We have plenty of other features that need proper documentation
>>>> to be used correctly and that we are improving over time to make them
>>>> more obvious by removing common errors or odd behaviour.
>>>> 
>>>>> To this point "Just curious, would it make a big difference to commit
>>>>> the patch before srcmv and migrate it with the rest of the code base
>>>>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>>>>> it updated." -- I'm -∞ on any suggestion that code should be put in
>>>>> trunk to stop it from rotting. Code should land when it's ready. I
>>>>> hope we're all agreed on that and that this paragraph was redundant.
>>>> 
>>>> I was suggesting that the the patch is ready enough for trunk and that
>>>> the level of readiness should not be "solves all possible cases". 
>>>> Especially
>>>> for something that is disabled by default. If we take this to the extreme,
>>>> we'd never add any new features.
>>>> 
>>>> I'm not suggesting "it compiles for me, lets throw it into trunk".
>>>> 
>>>>> After srcmv, and then some work to OTP-ify each of the resultant
>>>>> subdirs, we should add this as a separate application. We might also
>>>>> mark it as beta in the first release to gather feedback from the
>>>>> community.
>>>> 
>>>> I don't see how that is any different from adding it before srcmv and
>>>> avoiding leaving the front-porting effort to a single person.
>>>> 
>>>> Ideally we'd already have srcmv done, but we don't and I don't want
>>>> to hold off progress for an architecture change.
>>>> 
>>>>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>>>>> should end by saying I'm +1 on couchdb gaining the ability to
>>>>> automatically compact its databases and views in principle.
>>>> 
>>>> :)
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 16 August 2011 13:19, Jan Lehnardt <[email protected]> wrote:
>>>>>> Good points Robert,
>>>>>> 
>>>>>> I replied inline and then hijacked the thread for a more general 
>>>>>> discussion, sorry about that  :)
>>>>>> 
>>>>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>>>> 
>>>>>>> Filipe,
>>>>>>> 
>>>>>>>  This is neat, I can definitely see the utility of the approach. I do 
>>>>>>> share the concerns expressed in other comments with respect to the use 
>>>>>>> of the config file for per db compaction specs and the use of a 
>>>>>>> compact_loop that waits on config change messages when the ets table is 
>>>>>>> empty. I don't think it fully takes into account the use case of large 
>>>>>>> numbers of small dbs and/or some very large dbs interspersed with a lot 
>>>>>>> of mid-size dbs.
>>>>>> 
>>>>>> As I seid in the ticket, per-db config is desirable, but I think outside 
>>>>>> of the scope of the ticket.
>>>>>> 
>>>>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour 
>>>>>>> or so. I also agree with others that the code base is reaching a point 
>>>>>>> of being a bit crufty and it might be a good time with the git 
>>>>>>> migration, etc.. to take a breath and commit to making some of these 
>>>>>>> OTP compliant changes and design changes we've talked about.
>>>>>> 
>>>>>> Just curious, would it make a big difference to commit the patch before 
>>>>>> srcmv and migrate it with the rest of the code base rather than letting 
>>>>>> it rot in JIRA and leave it all to Filipe to keep it updated.
>>>>>> 
>>>>>> I also fear that a srcmv'd release is still out a bit and I'd really 
>>>>>> like to see this one (and a few others) go into 1.2 (as per my previous 
>>>>>> mail to this list in another thread). While it isn't the absolute 
>>>>>> perfect solution in all cases, it is disabled by default and manual 
>>>>>> compaction strategies work as they did before. In the meantime, we can 
>>>>>> refine the rest of the system to make it more fully fledged and maybe 
>>>>>> even enable it by default a few versions down when we are all 
>>>>>> comfortable with it. I'm not very comfortable keeping good patches in 
>>>>>> JIRA and not trunk until they solve every little edge case. We haven't 
>>>>>> worked like this in the past and I don't think it is worth doing.
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Bob
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>>>> 
>>>>>>>> Developers, users,
>>>>>>>> 
>>>>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>>>>> I won't describe it here with detail since it's already done in the 
>>>>>>>> Jira ticket.
>>>>>>>> 
>>>>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Filipe David Manana,
>>>>>>>> [email protected], [email protected]
>>>>>>>> 
>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Bringing automatic compaction into trunk

Reply via email to