need to wait a little more. The good news
is that the lessons learned while working on the Java code should help
with XZ Utils.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
before 1.0. I don't have anything
planned before 1.0 but maybe someone finds something that could be
improved. :-)
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
in a Maven
repository would be useful also for Apache Commons Compress integration.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
in decompressing ZIP files.
Probably not in practice. The spec doesn't mention them, alas.
The Java version doesn't support LZMA1. If it is adapted to support it,
there's no similar limit of lc + lp = 4 as there is in liblzma
because in Java the arrays are allocated one by one.
--
Lasse Collin
XZ for Java 1.0 was released earlier this week:
http://tukaani.org/xz/java.html
The code is available also in Maven Central. The actual Java code
is identical to the version 0.4, but I made a new release to make
it clear that the code and API should now be stable.
--
Lasse Collin | IRC
is also in xz -lvv --robot output so it should be easy to
parse. The idea of --robot is to make parsing simple and stable across
xz versions.
I didn't update the man page yet.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
it probably will need some sort of soft
default limit to keep the maximum memory usage sane. The definition of
sane is unclear though. It's not necessarily the same as for
compression.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
On 2011-11-28 Thorsten Glaser wrote:
Lasse Collin dixit:
If xz does indeed know it needs a zero’d allocation and
can express that in page sizes (pretty non-portable),
_and_ has fallback code for mmap-less architectutes (e.g.
several POSIX-for-Windows systems or ancient OSes) then
sure
--lzma2 -e
the -e option is ignored. The -e only affects the presets -0 ... -9. If
you want to take -8e as the starting point and then adjust the
dictionary size, use this:
xz --lzma2=preset=8e,dict=${DICT}KiB
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
is modified, I think it should also allow (de)compression of
symlinks and setuid, setgid, and sticky files. This way it would match
what --force does.
I would like to hear what people think about this.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
UnsupportedOptionsException.
* Fix bugs in the preset dictionary support in the LZMA2 encoder.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
On 2013-04-03 Pavel Raiskup wrote:
Hi all, would you please consider the following patch? It is adding
support for the '-h' grep option into xzgrep also. The author is Jeff
Bastian.
Thanks. Committed.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
for the decompression side too. I haven't
decided yet how to fix it (e.g. require an option or perhaps always
disable buffering).
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
implementations. The reason for using -e is that the
relative improvement tends to be bigger when that option is used. On
x86-64 I've seen even 25 % faster compression with some files compared
to the byte-by-byte method.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
, compression, and possibly also encryption are done by a
single file format and tool. When using tar with gzip/bzip2/xz/whatever
the tasks are done by separate file formats and tools.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
as there is just a single replacement header, other tricks
could work. Since there might be other replacement headers in the
future, it's no use to change this. It's simplest to add a one-line
wrapper header for VS 2013.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
.
* Fixed xzdiff to be compatible with FreeBSD's mktemp which differs
from most other mktemp implementations.
* Changed CPU core count detection to use cpuset_getaffinity() on
FreeBSD.
--
Lasse Collin | IRC: Larhzu @ IRCnet Freenode
On 2015-06-11 Lasse Collin wrote:
On 2015-05-28 Adam Walling wrote:
Everything is updated at https://github.com/adzm/xz_win now
You can also grab them directly at http://adzm.net/xz_win/
Thanks! It looks good now. There are still a few unneeded headers
listed, but those should have
n even though the reason
was known. I committed the fix to to the master branch (it will be in
v5.2 before 5.2.3).
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
I'm sorry that it takes so long for me to reply. :-(
On 2015-06-30 Adam Walling wrote:
On Fri, Jun 19, 2015 at 1:42 PM, Lasse Collin
lasse.col...@tukaani.org wrote:
...
My understanding is that the symbols exported from a DLL should be
marked with __declspec(dllexport) when building
inux issue (or similar), it would be good to report it to
the distribution maintainers.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
e but it would be OK too if there were no
other way.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ck of warnings doesn't mean that there
aren't any aliasing issues.
src/liblzma/common/memcmplen.h is another place where similar casts are
done. It's a new file in 5.2.x (not present in 5.0.x) and was added to
get a bit better performance for common buffer comparisons.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
w LZMA (i.e. LZMA1) streams
and to the legacy .lzma format.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Thanks for your comment!
On 2016-12-18 John Reiser wrote:
> On 12/18/2016 11:30 AM, Lasse Collin wrote:
> > There's a bug report about data loss due computer losing power.
> > fsync() or fdatasync() in xz would quite likely have avoided the
> > data loss.
> >
>
he value that is internally available for
--threads=0.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
d lzma_file_info_decoder() into liblzma and use it in xz to
implement the --list feature.
* Capsicum sandbox support is enabled by default where available
(FreeBSD >= 10).
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
> As XZ uses org.tukaani.xz as its OSGi bundle name, it seems logical to
> use the same name for a module. The simple change below would achieve
> that.
Thanks. I will add that. There needs to be a new release soon anyway to
fix the issue of XZ for Java 1.7 binaries in the Maven Central
t;ant". Set it to 1.6 or higher. The default value 1.5
isn't supported by OpenJDK 9 or later.
* Add "Automatic-Module-Name" = "org.tukaani.xz".
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ould be useful for
testing compiler-specific things, but since I already have trouble
getting things done, I probably am not going spend time on setting
it up etc. in the foreseeable future.
> In any source file that refers to Intel intrinsics, there should be a
> #include of to futu
On 2018-01-01 Stefan Bodewig wrote:
> On 2017-12-31, Lasse Collin wrote:
> > Would it be too complicated to turn XZ into a proper module? How
> > useful is that?
>
> As XZ hasn't got any dependencies you'd only benefit from an explicit
> module info if you wanted to rest
e input and output chunk sizes is
probably needed to make the second version faster. You could try
some odd values between 100 and 250, or maybe even up to 500.
On the other hand, it's possible that I'm putting too much weight on the
importance of fuzzing the stop & continue code paths.
Thanks again!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
as valuable for fuzzing as I thought
and we should just use the simple fast version.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2018-11-22 Pavel Raiskup wrote:
> ---
> src/xz/signals.c | 2 +-
> src/xz/xz.1 | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
Thanks! Committed.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
.4 to
be translated (or submit to a small subgroup to find out if something
needs to be fixed at xz side first)?
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2019-06-23 Mario Blättermann wrote:
> Am Sa., 22. Juni 2019 um 23:53 Uhr schrieb Lasse Collin
> :
> > - Is xzdec-man.pot intentionally there or should it be part of
> > xz-man.pot?
> >
> It's an artifact from a separate creation of a pot file for xzdec.1.
>
the line length and alignment problems for good.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2020-02-12 Mario Blättermann wrote:
> Lasse Collin schrieb am Fr., 7. Feb. 2020,
> 15:32:
> > - po4a is never needed when building from a release tarball.
>
> This means, distribution packagers don't have to bother with po4a,
> and the translated man pages will be in
in that email are
still valid. In addition to testing with MSVC, it can now be tested on
other platforms too.
At least I would like to know if it works with MSVC to build static and
shared liblzma, and does it name the shared liblzma file "liblzma.dll"
as I hope it does.
Thanks!
--
La
ds like a bug in the CMake files unless MSVC really supports
__builtin_assume_aligned that GCC and Clang support. xz 5.2.4 doesn't
recognize that define so it's harmless there. So this must be fixed too
if it is broken. The above snapshot is good for testing this with CMake.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
y, more feedback from CMake users is welcome. The liblzma part
should work on multiple operating systems; it's not Windows-only
anymore. See the comment in the beginning of CMakeLists.txt in xz.git.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ripting for those details is
ugly). The test should be done so that there are translated man pages
available in the po4a/man directory, e.g. by using de.po from the first
post of this thread.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ic to GNU/Linux" would be
important (and a check if librt is needed for clock_gettime). The M4
macros are in m4/tuklib_cpucores.m4 and m4/tuklib_physmem.m4. These
aren't a priority for now as the main reason for CMake support still is
MSVC and gett
huge so it the command line lengths don't get too long, but
is it bad/ugly on Windows?
- NDEBUG shouldn't be #defined for debug builds.
Thoughts, fixes, suggestions etc. are welcome.
--
Lasse Collin |
On 2020-02-14 Mario Blättermann wrote:
> Am Mi., 12. Feb. 2020 um 19:05 Uhr schrieb Lasse Collin
> :
> > On 2020-02-12 Mario Blättermann wrote:
> > > Lasse Collin schrieb am Fr., 7. Feb.
> > > 2020, 15:32:
> > > > The extra po4a options like unkn
o liblzmaConfig.cmake
since I got an impression from other sources that it is required.
Thanks for your help!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
#
#
# This provides very limited CMake support:
#
On 2020-02-16 Lasse Collin wrote:
> I quickly checked a few random things and didn't find anything else
> that wouldn't be OK with 3.1.
target_link_options needs 3.13. Seems that it can be avoided by using
set_target_properties(liblzma PROPERTIES LINK_FLAGS
-Wl,--version-
f supporting old versions largely depends on if the CMake
support will become usable for more than building liblzma on Windows.
> On Sun, Feb 16, 2020 at 4:48 PM Lasse Collin
> wrote:
> > If I use add_library(lzma ...) and PREFIX "lib", isn't the end
> > result practic
rom vcpkg.
A link to a known good CMakeLists.txt (and possible other files) would
be nice along with an explanation what it builds (e.g. both static and
shared liblzma?) and which targets are supported (e.g. is it MSVC
only?). Thanks!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
me other features)
should have been implemented years ago. However, I haven't had energy
to do it.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
didn't get good results with
with megabyte-sized files but perhaps I don't know how to use it
correctly. Using a single .tar as a dictionary worked great though.
In both cases you obviously need latest.tar to decompress the
old*.tar{.delta,.zst} files.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
busy all the time.
With decompression one has to decide how much memory can be used by
default. If there is no limit, in the extreme case a decoder could read
the whole input file in RAM and allocate output buffer for the whole
uncompressed file. This problem doesn't exist in your mmap (or pread)
approach.
misaligned column headings in tables. In the future, many of
these strings will be split and e.g. the table column
alignment will be handled in software. This should make the
strings easier to translate.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
t;https://tukaani.org/xz/>
> Product version: 5.3.1alpha
Looks good. I have finally committed it.
Thanks!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
endian uint64_t;
UINT64_MAX means unknown and then (and only then)
EOPM must be present
- LZMA2, possibly together with a BCJ or Delta filter, with
lzma_raw_decoder() since LZMA2 always includes the end marker.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
o initialize lzma_alone_decoder(), pass the five
bytes, then pass either the 64-bit uncompressed size (if EOS bit in .zip
headers is unset) or eight 0xFF bytes (if EOS bit in .zip headers is
set), and finally pass the actual LZMA data.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
on/common_w32res.rc
@@ -6,7 +6,9 @@
*/
#include
-#include "config.h"
+#ifdef HAVE_CONFIG_H
+# include "config.h"
+#endif
#define LZMA_H_INTERNAL
#define LZMA_H_INTERNAL_RC
#include "lzma/version.h"
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Hello!
Since xz-devel is subscribers only, I quote your message in full and
also include your test scripts as an attachment for others to see.
On 2021-01-09 Étienne Mollier wrote:
> Lasse Collin, on 2021-01-09 17:38:20 +0200:
> > The following patch replace -cdfq with -cdf an
On 2021-01-11 Étienne Mollier wrote:
> Lasse Collin, on 2021-01-11 19:19:09 +0200:
> > I understand from your message that you got a different result. I
> > wonder what would explain the difference. Your results are close to
> > what I would expect with the "trap '' PIP
On 2021-01-10 Sebastian Andrzej Siewior wrote:
> On 2021-01-09 18:21:45 [+0200], Lasse Collin wrote:
> > Any thoughts on this patch?
>
> Yes, I think it makes sense. Following the symlink makes sense but
> removing the symlink is different from removing a file and since i
everyone happy.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
on failures and be able to continue with fewer threads.
However, I haven't heard complaints about this outside xz -T0 context
and I think the existing hack isn't horrible.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2020-12-14 Sebastian Andrzej Siewior wrote:
> On 2020-12-13 23:19:25 [+0200], Lasse Collin wrote:
> > Yes, reusing buffers and encoder/decoder states can be useful (fewer
> > page faults). Perhaps even the input buffer could be reused if it
> > is OK to waste som
nt idle threads) and
with a quick try it seems it might help with decoding too. The
significance depends a lot on the data, of course.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
sn't change the big
picture much and the above paragraph is still true. Implementing
threaded decompression would help xz but only with big packages.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2020-12-27 Vitaly Chikunov wrote:
> On Sat, Dec 26, 2020 at 05:04:02PM +0200, Lasse Collin wrote:
> > I cannot make everyone happy.
>
> Wow, that's philosophical! I think, we should solve this fundamental
> problem first. -- Even if we cannot satisfy everybody, better than
nd perhaps a few variations of it. Thanks!
There are multiple things in XZ Utils that I try to look at in the near
future so it will be a while until I will play with the Java code.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Simply trying to allocate a lot of memory (to test if it works) is more
realistic but I think it's still dumb.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
iles. I
think that code path is never used if the shell is fancy enough to do
complex redirections. However, it traps also SIGPIPE and I think it can
in such situations run rm -rf twice on the same path name. That feels
suspicious but perhaps it's not too bad, especially since the code
likely is never used on most systems.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2021-01-08 Lasse Collin wrote:
> It's tempting to ignore exit statuses >= 128 at the end of the script
> where it current checks for "$xz_status" -eq 0 but that doesn't work
> because in the middle of the script there is also this:
>
> case $xz_sta
le.
+In earlier versions this was only done with
+.BR \-\-force .
.TP
.BR \-f ", " \-\-force
This option has several effects:
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
or your help!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ny extra arguments; using -T0
would be enough. So it's a hack but a hack that has minimal effect on
existing behavior outside -T0.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
version in [2] reads the input
byte[] array byte-by-byte. Using a fast method to read 8 *aligned*
bytes at a time in native byte order should give more speed; after all,
it's one of the benefits of this method that one can read multiple
input bytes at a time.
A public domain patch for a faster CR
let's at least try to find some solution to the "xz -T0" case. It would
be nice to hear if my suggestion makes any sense. Thanks.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
r patch will find its way into XZ for Java in some form
but once again I repeat that it will take some time. These XZ projects
are only a hobby for me and currently I don't even turn on my computer
every day.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ch an algorithm was in the early plans but so were a
few other nice things but many never materialized.
I will look at the SHA-256 patch later. There are unusually many things
in the queue of XZ-related things.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
{
+buf[pos++] = buf[back++];
+} while (--left > 0);
+} else {
+System.arraycopy(buf, back, buf, pos, left);
+pos += left;
+}
if (full < pos)
full = pos;
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
use liblzma::liblzma in the config file. I
guess it's too late to change it now.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
.
The same thing happens too if the buffer size is kept at 8192 but first
byte isn't used (making the beginning of the buffer misaligned).
Moving the "(crc32 >> 32)" to a different position in the xor sequence
can affect things too... it's almost spooky. ;-)
It would be nice if you could compare these too and suggest what should
be committed. Maybe you can figure out an even better version.
Different CPU or 32-bit Java or other things may give quite different
results.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
r 32-bit Java or other things may give
> > quite different results.
>
> Truncating the crc to an int 1 time in the loop seems like a clear
> winner. I will play with this in my benchmark.
> My benchmark is calculating the crc64 of 8k of random bytes. I will
> change it to include misaligned read as well.
Thanks.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
t as multi-release JAR,
so multi-release can be used for other things too.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
t an OK
version to commit?
Is the following fine to you as the file header? Your email address can
be omitted if you prefer that. I will mention in the commit message
that you adapted the code from XZ Utils and benchmarked it.
/*
* CRC64
*
* Authors: Brett Okken
* Lasse Collin
*
* This file has been put into the public domain.
* You can do whatever you want with this file.
*/
Thanks!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
rheads can
be much larger than locking.
I put these optimizations in the "nice to have" category. Something
could be done to make the code better but it's not urgent and so these
won't be in the next release.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2021-02-03 Brett Okken wrote:
> On Wed, Feb 3, 2021 at 2:56 PM Lasse Collin
> wrote:
> > It seems to regress horribly if dist is zero. A file with a very
> > long sequence of the same byte is good for testing.
>
> Would this be a valid test of what you are descri
m even if one uses BufferedInputStream. BufferedInputStream
has synchronized read(). I don't know how much locking matters in this
case. I'm not curious enough to try with a non-synchronized buffered
input stream now.
There are related comments in the "java buffer writes" thread.
--
Lasse Col
se, it is either already finished or
partial output has already been enabled. In both cases
lzma_outq_enable_partial_output() will do nothing.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
(buf, back, buf, pos, copySize);
}
pos += copySize;
left -= copySize;
} while (left > 0);
if (full < pos)
full = pos;
}
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
that
it's worth the extra effort and complexity for such a small speed gain.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
Java 9
for module-info support but otherwise the code should still be
Java 5 compatible (see README and comments in build.properties).
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
as a fairly small effect.
Omitting it keeps the code a tiny bit simpler.
I have committed the change. I think xz-java.git should now be almost
ready for a release. I just need to add NEWS and bump the version
number.
Thanks for your help!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2021-02-13 Brett Okken wrote:
> On Thu, Feb 11, 2021 at 12:51 PM Lasse Collin
> wrote:
> > I still worry about short copies. If the file is full of tiny
> > matches/repeats of 1-3 bytes or so, arraycopy can be slower. Such
> > files aren't typical at all but I don't w
? I have no Go
experience so I have no idea which are good or already popular.
Thanks!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
On 2021-04-15 Mario Blättermann wrote:
> Am So., 11. Apr. 2021 um 20:48 Uhr schrieb Lasse Collin
> :
> > I suppose I can just submit a snapshot from the master branch.
I have done this.
> I am curious to see when the first new translations will arrive :)
Me too. It's a lot of wo
s with 32-bit
> userspace, or systems that use XPA (an extension similar to x86's
> PAE).
>
> So, for MIPS32, we have to impose stronger memory limits. I've chosen
> 2000MiB to give the process some headroom.
Thanks! Committed.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
e corrections,
some didn't. Thus multiple translations were omitted from 5.2.5. With
this background I feel that if 5.2.6 is needed I won't consider any
*new* xz.po files for it anyway; new xz-man.po languages would be fine.
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
de, please make sure that word-wrapping
is disabled in the email client or use attachments. Thanks!
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
tem.arraycopy(buf, 0, buf, toCopy, remaining);
}
}
}
}
--
Lasse Collin | IRC: Larhzu @ IRCnet & Freenode
ook like
> > that writeChunk() could be overwriting the input.
>
> I assumed that lz.fillWindow(buf, off, len); would always process the
> 1 byte.
Yes, but it's not immediately obvious to a new reader. Also, many other
classes have tempBuf for identical use so it's good to keep that pa
On 2021-02-05 Brett Okken wrote:
> On Fri, Feb 5, 2021 at 11:07 AM Lasse Collin
> wrote:
> > Also, does it really help to unroll the loop? With 8191-byte
> > buffers I see no significant difference (in a quick
> > not-very-accurate test) if the switch-statement is replac
1 - 100 of 172 matches
Mail list logo