Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread ssz
> I already commented elsewhere on why this is not a good fit for IO

My bad, I did not realise that discussion is closed with final resolution.
I read "what do others think?" and thought that it is about continuing
discussion.

> Instead of keeping on arguing to shove your library in IO

I thought about this discussion as a discussion, sorry if I am annoying you.
I always think that being open to discussion and finding solutions, talking
to other people is a good strategy: the probability to find a good solution
is higher in this case.

And sorry, but, honestly, I still don't see any good arguments (or few).
The last one, about ANT, seems to me (just in my opinion, maybe wrong) a
not very good argument.
And I tried to explain why, maybe not so clearly; since you mention ANT
again, my arguments are definitely not clear enough;
Apache ANT is great, but there is no such functionality, in my opinion.
Also, if I remember correctly, I tried to explain why Commons CSV is not
what my solution is doing.
If you believe that in-memory sort or csv parsing is what, roughly speaking,
my solution is doing (or these solutions can be compared) then we have
different points of view, and there would not be consensus.
And I think this is also my fault.

> I would survey other projects (see above) to see if common functionality
could be extracted and more importantly if these projects would then be
interested in relying on a new library (where it may reside) instead of
maintaining their own code.
> ...
> and maybe elsewhere (Tika, Lucene, Solr, Spark?)

Other Apache libraries - great! I will think about it. Thank you for that
point. It's really great.

I already said above that I am not insisting on Commons IO.
I am just looking for alternatives and want to hear other people's opinions.
Maybe some other Commons or non-Commons library, I am not familiar with the
whole ecosystem.
Maybe someone else could give me a hint.
And you actually did it, twice about ANT and several times about Commons
CSV, I really appreciate it.
That was the one of the reasons why I wrote here.

JDK itself supports Files and IO operations, also it supports sorting and
binary search. Proposed functionality is out of JDK's scope, for sure, but
it seemed to me that this is close to what JDK offers. Obviously I was
wrong.

I think this discussion can be finished.
If someone else has something to add please feel free to email me directly.

Again, sorry for disturbing.
I didn't want to bother anyone, and didn't realise that it is what I'm
doing.
Thanks for taking the time and for trying to explain to me where I'm wrong.

Have a nice day!

One more thing
> Too much like a database operation, IO is a lower level library, and so
on.
One of my colleagues also thinks that Commons-IO is not quite a suitable
place for this proposition. So I'm totally sure that this is really not a
suitable place.




On Thu, Jul 20, 2023 at 4:16 PM Gary Gregory  wrote:

> I already commented elsewhere (can't recall if it was on this list or
> github) on why this is not a good fit for IO. Too much like a database
> operation, IO is a lower level library, and so on. IO is not a kitchen sink
> for anything related to IO. Like Lang, it was initially conceived as a
> library for low level operation that could be imagined to be in the JDK.
> It's actually perfectly fine that the JDK does not contain such operations
> as it should not be a kitchen sink either, but only provide primitive
> operations. IO also does not contain CSV operations, that's in Commons CSV.
> IO also does not contain high-level operations, projects like Apache Tika,
> Lucene, and Solr do that. This still feels like a component that provides
> one narrow purpose that should live in it's own project, which it already
> does, yours, and also happens to already exist within Apache in Ant and
> maybe elsewhere (Tika, Lucene, Solr, Spark?). So I think you are going
> about this backward: Instead of keeping on arguing to shove your library in
> IO, I would survey other projects (see above) to see if common
> functionality could be extracted and more importantly if these projects
> would then be interested in relying on a new library (where it may reside)
> instead of maintaining their own code. It does matter if the common code is
> derived from your library or existing projects (assuming proper licensing),
> what matters is improving the Apache ecosystem, and FOSS in general. If you
> are interested in I/O code and this interest matches the Commons IO
> component of the Commons project, then great, there are some recent and not
> so recent Jira tickets that could use some attention.
>
> Gary
>
> On Thu, Jul 20, 2023, 08:09 ssz  wrote:
>
> > That's great!
> > - But ANT is quite an ancient system, and it is now relatively unknown.
> > - And it is relatively heavy. Maybe it's better to have single-function
> in
> > the dedicated library or in well-known library with other useful features
> > - It uses in-memory sorting:
> >

Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread Gilles Sadowski
Le jeu. 20 juil. 2023 à 15:18, Gary Gregory  a écrit :
>
>  [...] Instead of keeping on arguing to shove your library in [...]

If we could stop the brutal language... (?)

The OP asked politely, and was ready to wait indefinitely
(unsubscribing from this ML) for an answer; I just wanted
to make sure that there was no missed opportunities (on
both ends).

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread Gary Gregory
I already commented elsewhere (can't recall if it was on this list or
github) on why this is not a good fit for IO. Too much like a database
operation, IO is a lower level library, and so on. IO is not a kitchen sink
for anything related to IO. Like Lang, it was initially conceived as a
library for low level operation that could be imagined to be in the JDK.
It's actually perfectly fine that the JDK does not contain such operations
as it should not be a kitchen sink either, but only provide primitive
operations. IO also does not contain CSV operations, that's in Commons CSV.
IO also does not contain high-level operations, projects like Apache Tika,
Lucene, and Solr do that. This still feels like a component that provides
one narrow purpose that should live in it's own project, which it already
does, yours, and also happens to already exist within Apache in Ant and
maybe elsewhere (Tika, Lucene, Solr, Spark?). So I think you are going
about this backward: Instead of keeping on arguing to shove your library in
IO, I would survey other projects (see above) to see if common
functionality could be extracted and more importantly if these projects
would then be interested in relying on a new library (where it may reside)
instead of maintaining their own code. It does matter if the common code is
derived from your library or existing projects (assuming proper licensing),
what matters is improving the Apache ecosystem, and FOSS in general. If you
are interested in I/O code and this interest matches the Commons IO
component of the Commons project, then great, there are some recent and not
so recent Jira tickets that could use some attention.

Gary

On Thu, Jul 20, 2023, 08:09 ssz  wrote:

> That's great!
> - But ANT is quite an ancient system, and it is now relatively unknown.
> - And it is relatively heavy. Maybe it's better to have single-function in
> the dedicated library or in well-known library with other useful features
> - It uses in-memory sorting:
>
> https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352
> - What about binary search?
>
> On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory 
> wrote:
>
> > Note that Apache Ant already provides similar functionality:
> >
> >
> https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html
> >
> > Gary
> >
> > On Thu, Jul 20, 2023, 07:38 Gilles Sadowski 
> wrote:
> >
> > > Hi.
> > >
> > > [Disclaimer: I'm not a user nor a developer of "Commons IO", so
> > > I'm not the most suitable for entertaining this conversation and,
> > > surely, I shouldn't be the only one...]
> > >
> > > Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
> > > >
> > > > Hi
> > > > Sure, I will support my code.
> > > > I have a lot of other opensource projects, not so much free time.
> > >
> > > I have to point out that the two sentences seem to neutralize
> > > themselves...
> > >
> > > > But this code will have the highest priority as Commons is used by
> > > > thousands of developers.
> > >
> > > That's what I've heard, but did not see much of a proof:  We have no
> > > reliable way to know where "Commons" code is used.  [This was a
> > > feature of open-source, but new regulations might make it a
> liability...]
> > > More importantly, if true, only a very tiny fraction of those users
> share
> > > their experience here, so that a quite small number of "regular"
> > > developers end up deciding what is useful.  Almost inevitably, the
> > > selection is biased...
> > >
> > > > My other projects are used by hundreds of people.
> > >
> > > That's great, but would not convince (based on the lack of feedback)
> > > a committer here who is not among those users.
> > >
> > > The general problem is:
> > >  1. The active team is not getting bigger.
> > >  2. Those "regular" developers find they have already too much  to
> > >  handle.
> > >  3. Hence they tend to not easily accept contributions that are (or
> > >  seem) likely to require time which they don't have.
> > >  4. This puts off would-be contributors that could have become part
> > >  of the active team.
> > >  1. The active team is not getting bigger...
> > >
> > > So I'm trying to find other arguments...
> > > Which projects (ASF?) depend on your proposed contribution?
> > >
> > > Regards,
> > > Gilles
> > >
> > > >>> [...]
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread ssz
> Which projects (ASF?) depend on your proposed contribution?

Currently only business projects, not opensource.
I'm thinking about RDF-Graph backed by FS. If I implement this solution I
will raise an issue with the Apache Jena team.
The original library will probably support multiplatform, in which case the
jvm part can use Commons.

+ I'm thinking about more use-cases. Sorting & searching might be useful
for working with logs.
+ I have asked colleagues, maybe we will find more examples..

On Thu, Jul 20, 2023 at 2:37 PM Gilles Sadowski 
wrote:

> Hi.
>
> [Disclaimer: I'm not a user nor a developer of "Commons IO", so
> I'm not the most suitable for entertaining this conversation and,
> surely, I shouldn't be the only one...]
>
> Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
> >
> > Hi
> > Sure, I will support my code.
> > I have a lot of other opensource projects, not so much free time.
>
> I have to point out that the two sentences seem to neutralize
> themselves...
>
> > But this code will have the highest priority as Commons is used by
> > thousands of developers.
>
> That's what I've heard, but did not see much of a proof:  We have no
> reliable way to know where "Commons" code is used.  [This was a
> feature of open-source, but new regulations might make it a liability...]
> More importantly, if true, only a very tiny fraction of those users share
> their experience here, so that a quite small number of "regular"
> developers end up deciding what is useful.  Almost inevitably, the
> selection is biased...
>
> > My other projects are used by hundreds of people.
>
> That's great, but would not convince (based on the lack of feedback)
> a committer here who is not among those users.
>
> The general problem is:
>  1. The active team is not getting bigger.
>  2. Those "regular" developers find they have already too much  to
>  handle.
>  3. Hence they tend to not easily accept contributions that are (or
>  seem) likely to require time which they don't have.
>  4. This puts off would-be contributors that could have become part
>  of the active team.
>  1. The active team is not getting bigger...
>
> So I'm trying to find other arguments...
> Which projects (ASF?) depend on your proposed contribution?
>
> Regards,
> Gilles
>
> >>> [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread ssz
For sure, you can sort in-memory, no problem here. I think we need to find
a well-known library with non-in-memory sorting and binary searching, it is
better relatively new (java8+)

On Thu, Jul 20, 2023 at 3:08 PM ssz  wrote:

> That's great!
> - But ANT is quite an ancient system, and it is now relatively unknown.
> - And it is relatively heavy. Maybe it's better to have single-function in
> the dedicated library or in well-known library with other useful features
> - It uses in-memory sorting:
> https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352
> - What about binary search?
>
> On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory 
> wrote:
>
>> Note that Apache Ant already provides similar functionality:
>>
>> https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html
>>
>> Gary
>>
>> On Thu, Jul 20, 2023, 07:38 Gilles Sadowski  wrote:
>>
>> > Hi.
>> >
>> > [Disclaimer: I'm not a user nor a developer of "Commons IO", so
>> > I'm not the most suitable for entertaining this conversation and,
>> > surely, I shouldn't be the only one...]
>> >
>> > Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
>> > >
>> > > Hi
>> > > Sure, I will support my code.
>> > > I have a lot of other opensource projects, not so much free time.
>> >
>> > I have to point out that the two sentences seem to neutralize
>> > themselves...
>> >
>> > > But this code will have the highest priority as Commons is used by
>> > > thousands of developers.
>> >
>> > That's what I've heard, but did not see much of a proof:  We have no
>> > reliable way to know where "Commons" code is used.  [This was a
>> > feature of open-source, but new regulations might make it a
>> liability...]
>> > More importantly, if true, only a very tiny fraction of those users
>> share
>> > their experience here, so that a quite small number of "regular"
>> > developers end up deciding what is useful.  Almost inevitably, the
>> > selection is biased...
>> >
>> > > My other projects are used by hundreds of people.
>> >
>> > That's great, but would not convince (based on the lack of feedback)
>> > a committer here who is not among those users.
>> >
>> > The general problem is:
>> >  1. The active team is not getting bigger.
>> >  2. Those "regular" developers find they have already too much  to
>> >  handle.
>> >  3. Hence they tend to not easily accept contributions that are (or
>> >  seem) likely to require time which they don't have.
>> >  4. This puts off would-be contributors that could have become part
>> >  of the active team.
>> >  1. The active team is not getting bigger...
>> >
>> > So I'm trying to find other arguments...
>> > Which projects (ASF?) depend on your proposed contribution?
>> >
>> > Regards,
>> > Gilles
>> >
>> > >>> [...]
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> > For additional commands, e-mail: dev-h...@commons.apache.org
>> >
>> >
>>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread ssz
That's great!
- But ANT is quite an ancient system, and it is now relatively unknown.
- And it is relatively heavy. Maybe it's better to have single-function in
the dedicated library or in well-known library with other useful features
- It uses in-memory sorting:
https://github.com/apache/ant/blob/master/src/main/org/apache/tools/ant/filters/SortFilter.java#L352
- What about binary search?

On Thu, Jul 20, 2023 at 2:56 PM Gary Gregory  wrote:

> Note that Apache Ant already provides similar functionality:
>
> https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html
>
> Gary
>
> On Thu, Jul 20, 2023, 07:38 Gilles Sadowski  wrote:
>
> > Hi.
> >
> > [Disclaimer: I'm not a user nor a developer of "Commons IO", so
> > I'm not the most suitable for entertaining this conversation and,
> > surely, I shouldn't be the only one...]
> >
> > Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
> > >
> > > Hi
> > > Sure, I will support my code.
> > > I have a lot of other opensource projects, not so much free time.
> >
> > I have to point out that the two sentences seem to neutralize
> > themselves...
> >
> > > But this code will have the highest priority as Commons is used by
> > > thousands of developers.
> >
> > That's what I've heard, but did not see much of a proof:  We have no
> > reliable way to know where "Commons" code is used.  [This was a
> > feature of open-source, but new regulations might make it a liability...]
> > More importantly, if true, only a very tiny fraction of those users share
> > their experience here, so that a quite small number of "regular"
> > developers end up deciding what is useful.  Almost inevitably, the
> > selection is biased...
> >
> > > My other projects are used by hundreds of people.
> >
> > That's great, but would not convince (based on the lack of feedback)
> > a committer here who is not among those users.
> >
> > The general problem is:
> >  1. The active team is not getting bigger.
> >  2. Those "regular" developers find they have already too much  to
> >  handle.
> >  3. Hence they tend to not easily accept contributions that are (or
> >  seem) likely to require time which they don't have.
> >  4. This puts off would-be contributors that could have become part
> >  of the active team.
> >  1. The active team is not getting bigger...
> >
> > So I'm trying to find other arguments...
> > Which projects (ASF?) depend on your proposed contribution?
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread Gary Gregory
Note that Apache Ant already provides similar functionality:
https://ant.apache.org/manual/api/org/apache/tools/ant/filters/SortFilter.html

Gary

On Thu, Jul 20, 2023, 07:38 Gilles Sadowski  wrote:

> Hi.
>
> [Disclaimer: I'm not a user nor a developer of "Commons IO", so
> I'm not the most suitable for entertaining this conversation and,
> surely, I shouldn't be the only one...]
>
> Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
> >
> > Hi
> > Sure, I will support my code.
> > I have a lot of other opensource projects, not so much free time.
>
> I have to point out that the two sentences seem to neutralize
> themselves...
>
> > But this code will have the highest priority as Commons is used by
> > thousands of developers.
>
> That's what I've heard, but did not see much of a proof:  We have no
> reliable way to know where "Commons" code is used.  [This was a
> feature of open-source, but new regulations might make it a liability...]
> More importantly, if true, only a very tiny fraction of those users share
> their experience here, so that a quite small number of "regular"
> developers end up deciding what is useful.  Almost inevitably, the
> selection is biased...
>
> > My other projects are used by hundreds of people.
>
> That's great, but would not convince (based on the lack of feedback)
> a committer here who is not among those users.
>
> The general problem is:
>  1. The active team is not getting bigger.
>  2. Those "regular" developers find they have already too much  to
>  handle.
>  3. Hence they tend to not easily accept contributions that are (or
>  seem) likely to require time which they don't have.
>  4. This puts off would-be contributors that could have become part
>  of the active team.
>  1. The active team is not getting bigger...
>
> So I'm trying to find other arguments...
> Which projects (ASF?) depend on your proposed contribution?
>
> Regards,
> Gilles
>
> >>> [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread Gilles Sadowski
Hi.

[Disclaimer: I'm not a user nor a developer of "Commons IO", so
I'm not the most suitable for entertaining this conversation and,
surely, I shouldn't be the only one...]

Le jeu. 20 juil. 2023 à 10:33, ssz  a écrit :
>
> Hi
> Sure, I will support my code.
> I have a lot of other opensource projects, not so much free time.

I have to point out that the two sentences seem to neutralize
themselves...

> But this code will have the highest priority as Commons is used by
> thousands of developers.

That's what I've heard, but did not see much of a proof:  We have no
reliable way to know where "Commons" code is used.  [This was a
feature of open-source, but new regulations might make it a liability...]
More importantly, if true, only a very tiny fraction of those users share
their experience here, so that a quite small number of "regular"
developers end up deciding what is useful.  Almost inevitably, the
selection is biased...

> My other projects are used by hundreds of people.

That's great, but would not convince (based on the lack of feedback)
a committer here who is not among those users.

The general problem is:
 1. The active team is not getting bigger.
 2. Those "regular" developers find they have already too much  to
 handle.
 3. Hence they tend to not easily accept contributions that are (or
 seem) likely to require time which they don't have.
 4. This puts off would-be contributors that could have become part
 of the active team.
 1. The active team is not getting bigger...

So I'm trying to find other arguments...
Which projects (ASF?) depend on your proposed contribution?

Regards,
Gilles

>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-io] question: file content merge sort and binary search

2023-07-20 Thread ssz
Hi
Sure, I will support my code.
I have a lot of other opensource projects, not so much free time.
But this code will have the highest priority as Commons is used by
thousands of developers. My other projects are used by hundreds of people.

On Wed, Jul 19, 2023 at 8:28 PM Gilles Sadowski 
wrote:

> Hi.
>
> Le mar. 18 juil. 2023 à 19:06, ssz  a écrit :
> >
> > [...]
> >
> > We use this library as a second-level cache when parsing CIMXML RDF, this
> > file-based cache contains triples, and also subject-type pairs (RDF
> nodes).
> > It is not csv.
> > Also, I'm thinking about RDF-Graph implementation backed by fs.
>
> This is where the discussion, about whether "Commons" is the
> right place, could start because...
>
> >
> > So, I think we can always find ways to use this functionality.
> > Placing it in some common place would save other developers time.
>
> ... placing it here implies that there will be people willing to stay
> around and maintain ...
>
> > Implementation of file-sorting and searching is not so simple as it
> sounds.
>
> ... this "not so simple" functionality.
>
> That's why we ask for use-cases: People who have a direct
> interest in maintaining the functionality are more likely to help
> fix it when the need arises.
> IOW, I'd expect the contributors of a major functionality of
> which they are the only known users to stay around in order
> to support it.
>
> Regards,
> Gilles
>
> > [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-19 Thread Gilles Sadowski
Hi.

Le mar. 18 juil. 2023 à 19:06, ssz  a écrit :
>
> [...]
>
> We use this library as a second-level cache when parsing CIMXML RDF, this
> file-based cache contains triples, and also subject-type pairs (RDF nodes).
> It is not csv.
> Also, I'm thinking about RDF-Graph implementation backed by fs.

This is where the discussion, about whether "Commons" is the
right place, could start because...

>
> So, I think we can always find ways to use this functionality.
> Placing it in some common place would save other developers time.

... placing it here implies that there will be people willing to stay
around and maintain ...

> Implementation of file-sorting and searching is not so simple as it sounds.

... this "not so simple" functionality.

That's why we ask for use-cases: People who have a direct
interest in maintaining the functionality are more likely to help
fix it when the need arises.
IOW, I'd expect the contributors of a major functionality of
which they are the only known users to stay around in order
to support it.

Regards,
Gilles

> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-io] question: file content merge sort and binary search

2023-07-19 Thread ssz
I added some additional details to README.md
Please let me know if I can add something for more understanding.

On Tue, Jul 18, 2023 at 7:25 PM Gilles Sadowski 
wrote:

> Hello.
>
> Le mar. 18 juil. 2023 à 17:35, ssz  a écrit :
> >
> > here
> https://github.com/sszuev/textfile-utils-examples/tree/master/src/test
>
> Yes, this shows the API and its usage, but I was also wondering
> about actual uses.  What kind of applications would need to call
> this functionality from Java?  What does your implementation bring
> which a user cannot do with "sort"?[1]
>
> Best regards,
> Gilles
>
> [1] https://en.wikipedia.org/wiki/Sort_(Unix)
>
> >
> > On Tue, Jul 18, 2023 at 12:03 PM Gilles Sadowski 
> > wrote:
> >
> > > Hello.
> > >
> > > Le mar. 18 juil. 2023 à 10:50, ssz  a écrit :
> > > >
> > > > Hello there
> > > >
> > > > I see this issue on hold.
> > > > So far, no one else has an opinion on this issue.
> > >
> > > Maybe "Commons Text"?
> > > It would help to see use-cases and API examples (in Java).
> > >
> > > Regards,
> > > Gilles
> > >
> > > > I'm going to unsubscribe from this list for a while.
> > > > Please email me directly in case of a positive final decision.
> > > >
> > > > sss.zuev {at} gmail / com
> > > >
> > > >>> [...]
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-18 Thread ssz
I thought everything is described in sufficient detail in the documentation
and project's README ...
Obviously it is not so clear, my fault, sorry.
Probably I should consider adding more explanation to README.md.
But I thought that merge-sorting and binary-search are well-known
algorithms, and we all know exactly how to do such operations in memory, so
we know some use cases.

It is a java/kotlin library, so you can use it from any environment, not
only linux. for example android, with limited disk-space and memory. When
you have a limitation of memory you can't use just `java.util.Arrays.sort`.
And writing a code which will call linux utility requires some coding, and
investigation too. As for me, it is good to have java utility for such a
purpose. I can't run testcontainers every time I need to just sort a file.
Also, I'm not sure that linux-utilities provide a way to sort something
with an arbitrary comparator. Maybe yes, but in this case you have to spend
time to find the right parameter.

As for usage.
It can be used to sort csv, or some "broken" csv where there is a delimiter
between rows, but each row contains a different number of words.
Off course, It can be used for sorting any file which has some delimiter.
It can be used to create some indexed files.
Probably we can use some frameworks or databases for it.
But sometimes we don't want to mess with heavy dependencies for the sake of
a few lines of code.

We use this library as a second-level cache when parsing CIMXML RDF, this
file-based cache contains triples, and also subject-type pairs (RDF nodes).
It is not csv.
Also, I'm thinking about RDF-Graph implementation backed by fs.

So, I think we can always find ways to use this functionality.
Placing it in some common place would save other developers time.
Implementation of file-sorting and searching is not so simple as it sounds.
You have to think about memory and performance, and, maybe, about
diskspace.
The library uses java NIO and coroutines. It is dealing with ByteArrays,
ByteBuffers, indexes.
It is easy to make mistakes.
Some developers may think that this is easy, after trying to write simply,
they may realize that it is not worth wasting time on this, and eventually
come to some kind of compromise solution, for example, calling a Linux
utility




On Tue, Jul 18, 2023 at 7:25 PM Gilles Sadowski 
wrote:

> Hello.
>
> Le mar. 18 juil. 2023 à 17:35, ssz  a écrit :
> >
> > here
> https://github.com/sszuev/textfile-utils-examples/tree/master/src/test
>
> Yes, this shows the API and its usage, but I was also wondering
> about actual uses.  What kind of applications would need to call
> this functionality from Java?  What does your implementation bring
> which a user cannot do with "sort"?[1]
>
> Best regards,
> Gilles
>
> [1] https://en.wikipedia.org/wiki/Sort_(Unix)
>
> >
> > On Tue, Jul 18, 2023 at 12:03 PM Gilles Sadowski 
> > wrote:
> >
> > > Hello.
> > >
> > > Le mar. 18 juil. 2023 à 10:50, ssz  a écrit :
> > > >
> > > > Hello there
> > > >
> > > > I see this issue on hold.
> > > > So far, no one else has an opinion on this issue.
> > >
> > > Maybe "Commons Text"?
> > > It would help to see use-cases and API examples (in Java).
> > >
> > > Regards,
> > > Gilles
> > >
> > > > I'm going to unsubscribe from this list for a while.
> > > > Please email me directly in case of a positive final decision.
> > > >
> > > > sss.zuev {at} gmail / com
> > > >
> > > >>> [...]
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-18 Thread Gilles Sadowski
Hello.

Le mar. 18 juil. 2023 à 17:35, ssz  a écrit :
>
> here https://github.com/sszuev/textfile-utils-examples/tree/master/src/test

Yes, this shows the API and its usage, but I was also wondering
about actual uses.  What kind of applications would need to call
this functionality from Java?  What does your implementation bring
which a user cannot do with "sort"?[1]

Best regards,
Gilles

[1] https://en.wikipedia.org/wiki/Sort_(Unix)

>
> On Tue, Jul 18, 2023 at 12:03 PM Gilles Sadowski 
> wrote:
>
> > Hello.
> >
> > Le mar. 18 juil. 2023 à 10:50, ssz  a écrit :
> > >
> > > Hello there
> > >
> > > I see this issue on hold.
> > > So far, no one else has an opinion on this issue.
> >
> > Maybe "Commons Text"?
> > It would help to see use-cases and API examples (in Java).
> >
> > Regards,
> > Gilles
> >
> > > I'm going to unsubscribe from this list for a while.
> > > Please email me directly in case of a positive final decision.
> > >
> > > sss.zuev {at} gmail / com
> > >
> > >>> [...]
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-io] question: file content merge sort and binary search

2023-07-18 Thread ssz
here https://github.com/sszuev/textfile-utils-examples/tree/master/src/test

On Tue, Jul 18, 2023 at 12:03 PM Gilles Sadowski 
wrote:

> Hello.
>
> Le mar. 18 juil. 2023 à 10:50, ssz  a écrit :
> >
> > Hello there
> >
> > I see this issue on hold.
> > So far, no one else has an opinion on this issue.
>
> Maybe "Commons Text"?
> It would help to see use-cases and API examples (in Java).
>
> Regards,
> Gilles
>
> > I'm going to unsubscribe from this list for a while.
> > Please email me directly in case of a positive final decision.
> >
> > sss.zuev {at} gmail / com
> >
> >>> [...]
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-18 Thread Gilles Sadowski
Hello.

Le mar. 18 juil. 2023 à 10:50, ssz  a écrit :
>
> Hello there
>
> I see this issue on hold.
> So far, no one else has an opinion on this issue.

Maybe "Commons Text"?
It would help to see use-cases and API examples (in Java).

Regards,
Gilles

> I'm going to unsubscribe from this list for a while.
> Please email me directly in case of a positive final decision.
>
> sss.zuev {at} gmail / com
>
>>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [commons-io] question: file content merge sort and binary search

2023-07-18 Thread ssz
Hello there

I see this issue on hold.
So far, no one else has an opinion on this issue.
I'm going to unsubscribe from this list for a while.
Please email me directly in case of a positive final decision.

sss.zuev {at} gmail / com

Thanks!

On Mon, Jul 10, 2023 at 12:17 AM Gary Gregory 
wrote:

> I've thought about this a little more and it seems to me that sorting and
> searching through any old binary file does not fit the remit of Commons IO
> or CSV. If anything it would be a new component, but it feels like the kind
> of database operations that do not fit in Commons.
>
> What do others think?
>
> Gary
>
>
> On Sun, Jul 9, 2023, 16:07 Gary Gregory  wrote:
>
> > Commons CSV supports the Java Streaming API so you can do whatever that
> > API offers,  including filtering, sorting, finding, and so on.
> >
> > More than plain CSVs are supported, and I encourage you to peruse the
> site
> > https://commons.apache.org/proper/commons-csv/
> >
> > If you think that component can be enhanced, feel free to keep the
> > conversation going with a more specific proposal.
> >
> > WRT Commons IO, it seems to me that IO is a lower level component and
> does
> > not match your offering and that Commons CSV might too much toward CSV
> > files. OTHO, it does not seem like what you propose would be generic
> enough
> > to parse any binary file, say an old school dBASE file, but I could be
> > wrong.
> >
> > Gary
> >
> > On Sun, Jul 9, 2023, 13:35 ssz  wrote:
> >
> >> Does common-csv support **sorting** large?
> >> Does it support binary search?
> >> What should I do if I have a non-csv text file?
> >>
> >> Actually I didn't say that textfile-utils is a library for working with
> >> csv
> >> files.
> >> I just provided you with an example.
> >>
> >>
> >>
> >>
> >> On Sun, Jul 9, 2023 at 8:23 PM Gary Gregory 
> >> wrote:
> >>
> >> > If the intent is to process CSV files, you're missing quite parameters
> >> in
> >> > order to process all of the different CSV flavors, see Apache Commons
> >> CSV.
> >> >
> >> > Gary
> >> >
> >> >
> >> > On Sun, Jul 9, 2023, 13:16 ssz  wrote:
> >> >
> >> > > text-files sort. e.g. CSV.
> >> > >
> >> > > Example:
> >> > > content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> >> > > after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> >> > > binary search by prefix `b`: `b,42;b,21`
> >> > >
> >> > > The project is completed with tests and documentation.
> >> > > It is open source.
> >> > > Github: https://github.com/DataFabricRus/textfile-utils
> >> > >
> >> > > I think there shouldn't be any problems with reading the code.
> >> > > Kotlin - is advanced java, or you can consider it as pseudocode.
> >> > >
> >> > > Perhaps I should supplement the description in `README.md` to make
> it
> >> > > clearer?
> >> > > Could you please tell me what I should include?
> >> > >
> >> > > Yes, many databases have sorted files under the hood.
> >> > > But what should I do if I need to just search in a big file?
> >> > > I can't reuse database code, I can't make a particular trivial task
> >> more
> >> > > complicated by using a database. I haven't been able to find any
> good
> >> > > solutions in regular libraries.
> >> > > So I invented this bicycle.I think the desire to have such a library
> >> is
> >> > > understandable.
> >> > >
> >> > > Please ask any questions.
> >> > >
> >> > >
> >> > > On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory  >
> >> > > wrote:
> >> > >
> >> > > > Hello,
> >> > > >
> >> > > > This seems to be me like a mismatch with Commons IO.
> >> > > >
> >> > > > What does it even mean to "sort" a file which are really a bunch
> of
> >> > > bytes.
> >> > > > Do you have a relevant example (Java base)?
> >> > > >
> >> > > > This feels more like a database primitive to me. What am I
> missing?
> >> > > >
> >> > > > Gary
> >> > > >
> >> > > > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
> >> > > >
> >> > > > > It seems to be well-known and generic functionality, so it would
> >> be
> >> > > nice
> >> > > > to
> >> > > > > have it in some well-known common place.
> >> > > > > Is *apache/commons-io* this place?
> >> > > > >
> >> > > > > Here is the draft:
> >> https://github.com/DataFabricRus/textfile-utils
> >> > > > > This is my library made for DataFablic, it is written on kotlin
> >> with
> >> > > > > coroutines and Java NIO.
> >> > > > > Of course, it can be ported to java (preserving kotlin-version
> for
> >> > > > > multiplatform)
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread ssz
A few remarks:

-- I think, (CSV/Java) Stream API is not suitable directly, it is difficult
to implement sorting just using streams. If you can suggest a simpler
solution, I will really appreciate it because the simpler the code, the
better. In the library, of course, streams (channels) and streams (data
flows) are used.
-- No matter how it may seem, the problem is not so trivial, although it
sounds simple. We should care about memory, diskspace and performance. The
investigation of code and commits will give an opportunity to estimate the
scope of the solution. -- The library is designed to work with any files if
they are divisible by some separator (this is the main purpose of the
library, there are also some other utilities)
-- It is not very clear to me why the possibility to sort the files lists
is suitable for commons-io, but the content-sorting functionality is not.
-- It would be just great if you could hint where else this functionality
could be included. I am not inclined to insist on commons-io, but I'm sure
having such functionality somewhere in a well-known place will save other
developers time. It's strange that there is no such functionality anywhere
yet (or maybe I couldn't find it). Of course, this is used in databases and
other serious frameworks, but sometimes we don't want to mess with heavy
dependencies for the sake of a few lines of code.



On Sun, Jul 9, 2023 at 11:07 PM Gary Gregory  wrote:

> Commons CSV supports the Java Streaming API so you can do whatever that API
> offers,  including filtering, sorting, finding, and so on.
>
> More than plain CSVs are supported, and I encourage you to peruse the site
> https://commons.apache.org/proper/commons-csv/
>
> If you think that component can be enhanced, feel free to keep the
> conversation going with a more specific proposal.
>
> WRT Commons IO, it seems to me that IO is a lower level component and does
> not match your offering and that Commons CSV might too much toward CSV
> files. OTHO, it does not seem like what you propose would be generic enough
> to parse any binary file, say an old school dBASE file, but I could be
> wrong.
>
> Gary
>
> On Sun, Jul 9, 2023, 13:35 ssz  wrote:
>
> > Does common-csv support **sorting** large?
> > Does it support binary search?
> > What should I do if I have a non-csv text file?
> >
> > Actually I didn't say that textfile-utils is a library for working with
> csv
> > files.
> > I just provided you with an example.
> >
> >
> >
> >
> > On Sun, Jul 9, 2023 at 8:23 PM Gary Gregory 
> > wrote:
> >
> > > If the intent is to process CSV files, you're missing quite parameters
> in
> > > order to process all of the different CSV flavors, see Apache Commons
> > CSV.
> > >
> > > Gary
> > >
> > >
> > > On Sun, Jul 9, 2023, 13:16 ssz  wrote:
> > >
> > > > text-files sort. e.g. CSV.
> > > >
> > > > Example:
> > > > content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> > > > after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> > > > binary search by prefix `b`: `b,42;b,21`
> > > >
> > > > The project is completed with tests and documentation.
> > > > It is open source.
> > > > Github: https://github.com/DataFabricRus/textfile-utils
> > > >
> > > > I think there shouldn't be any problems with reading the code.
> > > > Kotlin - is advanced java, or you can consider it as pseudocode.
> > > >
> > > > Perhaps I should supplement the description in `README.md` to make it
> > > > clearer?
> > > > Could you please tell me what I should include?
> > > >
> > > > Yes, many databases have sorted files under the hood.
> > > > But what should I do if I need to just search in a big file?
> > > > I can't reuse database code, I can't make a particular trivial task
> > more
> > > > complicated by using a database. I haven't been able to find any good
> > > > solutions in regular libraries.
> > > > So I invented this bicycle.I think the desire to have such a library
> is
> > > > understandable.
> > > >
> > > > Please ask any questions.
> > > >
> > > >
> > > > On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > This seems to be me like a mismatch with Commons IO.
> > > > >
> > > > > What does it even mean to "sort" a file which are really a bunch of
> > > > bytes.
> > > > > Do you have a relevant example (Java base)?
> > > > >
> > > > > This feels more like a database primitive to me. What am I missing?
> > > > >
> > > > > Gary
> > > > >
> > > > > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
> > > > >
> > > > > > It seems to be well-known and generic functionality, so it would
> be
> > > > nice
> > > > > to
> > > > > > have it in some well-known common place.
> > > > > > Is *apache/commons-io* this place?
> > > > > >
> > > > > > Here is the draft:
> https://github.com/DataFabricRus/textfile-utils
> > > > > > This is my library made for DataFablic, it is written on kotlin
> > with
> > > > > > coroutines and Java NIO.
> > > > > > Of course, it can be ported to java (preserving 

Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread Gary Gregory
I've thought about this a little more and it seems to me that sorting and
searching through any old binary file does not fit the remit of Commons IO
or CSV. If anything it would be a new component, but it feels like the kind
of database operations that do not fit in Commons.

What do others think?

Gary


On Sun, Jul 9, 2023, 16:07 Gary Gregory  wrote:

> Commons CSV supports the Java Streaming API so you can do whatever that
> API offers,  including filtering, sorting, finding, and so on.
>
> More than plain CSVs are supported, and I encourage you to peruse the site
> https://commons.apache.org/proper/commons-csv/
>
> If you think that component can be enhanced, feel free to keep the
> conversation going with a more specific proposal.
>
> WRT Commons IO, it seems to me that IO is a lower level component and does
> not match your offering and that Commons CSV might too much toward CSV
> files. OTHO, it does not seem like what you propose would be generic enough
> to parse any binary file, say an old school dBASE file, but I could be
> wrong.
>
> Gary
>
> On Sun, Jul 9, 2023, 13:35 ssz  wrote:
>
>> Does common-csv support **sorting** large?
>> Does it support binary search?
>> What should I do if I have a non-csv text file?
>>
>> Actually I didn't say that textfile-utils is a library for working with
>> csv
>> files.
>> I just provided you with an example.
>>
>>
>>
>>
>> On Sun, Jul 9, 2023 at 8:23 PM Gary Gregory 
>> wrote:
>>
>> > If the intent is to process CSV files, you're missing quite parameters
>> in
>> > order to process all of the different CSV flavors, see Apache Commons
>> CSV.
>> >
>> > Gary
>> >
>> >
>> > On Sun, Jul 9, 2023, 13:16 ssz  wrote:
>> >
>> > > text-files sort. e.g. CSV.
>> > >
>> > > Example:
>> > > content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
>> > > after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
>> > > binary search by prefix `b`: `b,42;b,21`
>> > >
>> > > The project is completed with tests and documentation.
>> > > It is open source.
>> > > Github: https://github.com/DataFabricRus/textfile-utils
>> > >
>> > > I think there shouldn't be any problems with reading the code.
>> > > Kotlin - is advanced java, or you can consider it as pseudocode.
>> > >
>> > > Perhaps I should supplement the description in `README.md` to make it
>> > > clearer?
>> > > Could you please tell me what I should include?
>> > >
>> > > Yes, many databases have sorted files under the hood.
>> > > But what should I do if I need to just search in a big file?
>> > > I can't reuse database code, I can't make a particular trivial task
>> more
>> > > complicated by using a database. I haven't been able to find any good
>> > > solutions in regular libraries.
>> > > So I invented this bicycle.I think the desire to have such a library
>> is
>> > > understandable.
>> > >
>> > > Please ask any questions.
>> > >
>> > >
>> > > On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > This seems to be me like a mismatch with Commons IO.
>> > > >
>> > > > What does it even mean to "sort" a file which are really a bunch of
>> > > bytes.
>> > > > Do you have a relevant example (Java base)?
>> > > >
>> > > > This feels more like a database primitive to me. What am I missing?
>> > > >
>> > > > Gary
>> > > >
>> > > > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
>> > > >
>> > > > > It seems to be well-known and generic functionality, so it would
>> be
>> > > nice
>> > > > to
>> > > > > have it in some well-known common place.
>> > > > > Is *apache/commons-io* this place?
>> > > > >
>> > > > > Here is the draft:
>> https://github.com/DataFabricRus/textfile-utils
>> > > > > This is my library made for DataFablic, it is written on kotlin
>> with
>> > > > > coroutines and Java NIO.
>> > > > > Of course, it can be ported to java (preserving kotlin-version for
>> > > > > multiplatform)
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread Gary Gregory
Commons CSV supports the Java Streaming API so you can do whatever that API
offers,  including filtering, sorting, finding, and so on.

More than plain CSVs are supported, and I encourage you to peruse the site
https://commons.apache.org/proper/commons-csv/

If you think that component can be enhanced, feel free to keep the
conversation going with a more specific proposal.

WRT Commons IO, it seems to me that IO is a lower level component and does
not match your offering and that Commons CSV might too much toward CSV
files. OTHO, it does not seem like what you propose would be generic enough
to parse any binary file, say an old school dBASE file, but I could be
wrong.

Gary

On Sun, Jul 9, 2023, 13:35 ssz  wrote:

> Does common-csv support **sorting** large?
> Does it support binary search?
> What should I do if I have a non-csv text file?
>
> Actually I didn't say that textfile-utils is a library for working with csv
> files.
> I just provided you with an example.
>
>
>
>
> On Sun, Jul 9, 2023 at 8:23 PM Gary Gregory 
> wrote:
>
> > If the intent is to process CSV files, you're missing quite parameters in
> > order to process all of the different CSV flavors, see Apache Commons
> CSV.
> >
> > Gary
> >
> >
> > On Sun, Jul 9, 2023, 13:16 ssz  wrote:
> >
> > > text-files sort. e.g. CSV.
> > >
> > > Example:
> > > content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> > > after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> > > binary search by prefix `b`: `b,42;b,21`
> > >
> > > The project is completed with tests and documentation.
> > > It is open source.
> > > Github: https://github.com/DataFabricRus/textfile-utils
> > >
> > > I think there shouldn't be any problems with reading the code.
> > > Kotlin - is advanced java, or you can consider it as pseudocode.
> > >
> > > Perhaps I should supplement the description in `README.md` to make it
> > > clearer?
> > > Could you please tell me what I should include?
> > >
> > > Yes, many databases have sorted files under the hood.
> > > But what should I do if I need to just search in a big file?
> > > I can't reuse database code, I can't make a particular trivial task
> more
> > > complicated by using a database. I haven't been able to find any good
> > > solutions in regular libraries.
> > > So I invented this bicycle.I think the desire to have such a library is
> > > understandable.
> > >
> > > Please ask any questions.
> > >
> > >
> > > On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > This seems to be me like a mismatch with Commons IO.
> > > >
> > > > What does it even mean to "sort" a file which are really a bunch of
> > > bytes.
> > > > Do you have a relevant example (Java base)?
> > > >
> > > > This feels more like a database primitive to me. What am I missing?
> > > >
> > > > Gary
> > > >
> > > > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
> > > >
> > > > > It seems to be well-known and generic functionality, so it would be
> > > nice
> > > > to
> > > > > have it in some well-known common place.
> > > > > Is *apache/commons-io* this place?
> > > > >
> > > > > Here is the draft: https://github.com/DataFabricRus/textfile-utils
> > > > > This is my library made for DataFablic, it is written on kotlin
> with
> > > > > coroutines and Java NIO.
> > > > > Of course, it can be ported to java (preserving kotlin-version for
> > > > > multiplatform)
> > > > >
> > > >
> > >
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread ssz
Does common-csv support **sorting** large?
Does it support binary search?
What should I do if I have a non-csv text file?

Actually I didn't say that textfile-utils is a library for working with csv
files.
I just provided you with an example.




On Sun, Jul 9, 2023 at 8:23 PM Gary Gregory  wrote:

> If the intent is to process CSV files, you're missing quite parameters in
> order to process all of the different CSV flavors, see Apache Commons CSV.
>
> Gary
>
>
> On Sun, Jul 9, 2023, 13:16 ssz  wrote:
>
> > text-files sort. e.g. CSV.
> >
> > Example:
> > content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> > after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> > binary search by prefix `b`: `b,42;b,21`
> >
> > The project is completed with tests and documentation.
> > It is open source.
> > Github: https://github.com/DataFabricRus/textfile-utils
> >
> > I think there shouldn't be any problems with reading the code.
> > Kotlin - is advanced java, or you can consider it as pseudocode.
> >
> > Perhaps I should supplement the description in `README.md` to make it
> > clearer?
> > Could you please tell me what I should include?
> >
> > Yes, many databases have sorted files under the hood.
> > But what should I do if I need to just search in a big file?
> > I can't reuse database code, I can't make a particular trivial task more
> > complicated by using a database. I haven't been able to find any good
> > solutions in regular libraries.
> > So I invented this bicycle.I think the desire to have such a library is
> > understandable.
> >
> > Please ask any questions.
> >
> >
> > On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
> > wrote:
> >
> > > Hello,
> > >
> > > This seems to be me like a mismatch with Commons IO.
> > >
> > > What does it even mean to "sort" a file which are really a bunch of
> > bytes.
> > > Do you have a relevant example (Java base)?
> > >
> > > This feels more like a database primitive to me. What am I missing?
> > >
> > > Gary
> > >
> > > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
> > >
> > > > It seems to be well-known and generic functionality, so it would be
> > nice
> > > to
> > > > have it in some well-known common place.
> > > > Is *apache/commons-io* this place?
> > > >
> > > > Here is the draft: https://github.com/DataFabricRus/textfile-utils
> > > > This is my library made for DataFablic, it is written on kotlin with
> > > > coroutines and Java NIO.
> > > > Of course, it can be ported to java (preserving kotlin-version for
> > > > multiplatform)
> > > >
> > >
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread Gary Gregory
If the intent is to process CSV files, you're missing quite parameters in
order to process all of the different CSV flavors, see Apache Commons CSV.

Gary


On Sun, Jul 9, 2023, 13:16 ssz  wrote:

> text-files sort. e.g. CSV.
>
> Example:
> content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> binary search by prefix `b`: `b,42;b,21`
>
> The project is completed with tests and documentation.
> It is open source.
> Github: https://github.com/DataFabricRus/textfile-utils
>
> I think there shouldn't be any problems with reading the code.
> Kotlin - is advanced java, or you can consider it as pseudocode.
>
> Perhaps I should supplement the description in `README.md` to make it
> clearer?
> Could you please tell me what I should include?
>
> Yes, many databases have sorted files under the hood.
> But what should I do if I need to just search in a big file?
> I can't reuse database code, I can't make a particular trivial task more
> complicated by using a database. I haven't been able to find any good
> solutions in regular libraries.
> So I invented this bicycle.I think the desire to have such a library is
> understandable.
>
> Please ask any questions.
>
>
> On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
> wrote:
>
> > Hello,
> >
> > This seems to be me like a mismatch with Commons IO.
> >
> > What does it even mean to "sort" a file which are really a bunch of
> bytes.
> > Do you have a relevant example (Java base)?
> >
> > This feels more like a database primitive to me. What am I missing?
> >
> > Gary
> >
> > On Sun, Jul 9, 2023, 10:42 ssz  wrote:
> >
> > > It seems to be well-known and generic functionality, so it would be
> nice
> > to
> > > have it in some well-known common place.
> > > Is *apache/commons-io* this place?
> > >
> > > Here is the draft: https://github.com/DataFabricRus/textfile-utils
> > > This is my library made for DataFablic, it is written on kotlin with
> > > coroutines and Java NIO.
> > > Of course, it can be ported to java (preserving kotlin-version for
> > > multiplatform)
> > >
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread ssz
More example (in code):
(sort)
https://github.com/DataFabricRus/textfile-utils/blob/main/src/test/kotlin/MergeSortTest.kt#L202
(search)
https://github.com/DataFabricRus/textfile-utils/blob/main/src/test/kotlin/BinarySearchTest.kt#L20


On Sun, Jul 9, 2023 at 8:14 PM ssz  wrote:

> text-files sort. e.g. CSV.
>
> Example:
> content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
> after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
> binary search by prefix `b`: `b,42;b,21`
>
> The project is completed with tests and documentation.
> It is open source.
> Github: https://github.com/DataFabricRus/textfile-utils
>
> I think there shouldn't be any problems with reading the code.
> Kotlin - is advanced java, or you can consider it as pseudocode.
>
> Perhaps I should supplement the description in `README.md` to make it
> clearer?
> Could you please tell me what I should include?
>
> Yes, many databases have sorted files under the hood.
> But what should I do if I need to just search in a big file?
> I can't reuse database code, I can't make a particular trivial task more
> complicated by using a database. I haven't been able to find any good
> solutions in regular libraries.
> So I invented this bicycle.I think the desire to have such a library is
> understandable.
>
> Please ask any questions.
>
>
> On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory 
> wrote:
>
>> Hello,
>>
>> This seems to be me like a mismatch with Commons IO.
>>
>> What does it even mean to "sort" a file which are really a bunch of bytes.
>> Do you have a relevant example (Java base)?
>>
>> This feels more like a database primitive to me. What am I missing?
>>
>> Gary
>>
>> On Sun, Jul 9, 2023, 10:42 ssz  wrote:
>>
>> > It seems to be well-known and generic functionality, so it would be
>> nice to
>> > have it in some well-known common place.
>> > Is *apache/commons-io* this place?
>> >
>> > Here is the draft: https://github.com/DataFabricRus/textfile-utils
>> > This is my library made for DataFablic, it is written on kotlin with
>> > coroutines and Java NIO.
>> > Of course, it can be ported to java (preserving kotlin-version for
>> > multiplatform)
>> >
>>
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread ssz
text-files sort. e.g. CSV.

Example:
content: `d,420;b,42;b,21;a;21;c;"42"`, delimiter ';'
after sort by prefix: `a:21;b,42;b,21;c:"42";d,420`
binary search by prefix `b`: `b,42;b,21`

The project is completed with tests and documentation.
It is open source.
Github: https://github.com/DataFabricRus/textfile-utils

I think there shouldn't be any problems with reading the code.
Kotlin - is advanced java, or you can consider it as pseudocode.

Perhaps I should supplement the description in `README.md` to make it
clearer?
Could you please tell me what I should include?

Yes, many databases have sorted files under the hood.
But what should I do if I need to just search in a big file?
I can't reuse database code, I can't make a particular trivial task more
complicated by using a database. I haven't been able to find any good
solutions in regular libraries.
So I invented this bicycle.I think the desire to have such a library is
understandable.

Please ask any questions.


On Sun, Jul 9, 2023 at 6:40 PM Gary Gregory  wrote:

> Hello,
>
> This seems to be me like a mismatch with Commons IO.
>
> What does it even mean to "sort" a file which are really a bunch of bytes.
> Do you have a relevant example (Java base)?
>
> This feels more like a database primitive to me. What am I missing?
>
> Gary
>
> On Sun, Jul 9, 2023, 10:42 ssz  wrote:
>
> > It seems to be well-known and generic functionality, so it would be nice
> to
> > have it in some well-known common place.
> > Is *apache/commons-io* this place?
> >
> > Here is the draft: https://github.com/DataFabricRus/textfile-utils
> > This is my library made for DataFablic, it is written on kotlin with
> > coroutines and Java NIO.
> > Of course, it can be ported to java (preserving kotlin-version for
> > multiplatform)
> >
>


Re: [commons-io] question: file content merge sort and binary search

2023-07-09 Thread Gary Gregory
Hello,

This seems to be me like a mismatch with Commons IO.

What does it even mean to "sort" a file which are really a bunch of bytes.
Do you have a relevant example (Java base)?

This feels more like a database primitive to me. What am I missing?

Gary

On Sun, Jul 9, 2023, 10:42 ssz  wrote:

> It seems to be well-known and generic functionality, so it would be nice to
> have it in some well-known common place.
> Is *apache/commons-io* this place?
>
> Here is the draft: https://github.com/DataFabricRus/textfile-utils
> This is my library made for DataFablic, it is written on kotlin with
> coroutines and Java NIO.
> Of course, it can be ported to java (preserving kotlin-version for
> multiplatform)
>