So that is more a longer mail. As for the upload package granularity, a few 

1. Every OSS project could be one upload. One idea of fossology is the reuse of 
analysis and that works usually well with an OSS package-kind of granularity 
and not with large archives of all kind of software.

2. You could group different OSS projects together for analysing more 
efficiently. Users do that with super small OSS, for example nom packages.

3. As for “own” or proprietary software: usually the fossology tooling scans 
for software being ready for distribution, 99% OSS, thus in most cases 
containing licensing and copyright statements. If you have proprietary software 
without licensing statements which could be found with fossology -> not much 
will happen.

4. Super large uploads require database settings change (shared memory, work 
memory in postgresql) and will result in super large SPDX (or other report 
files), since FOSSology does not support partial report generation of an upload.

5. However, with the software heritage agent querying, you could check if point 
5 “ Our applications in C and shell scripts.” Really contains your application 

So in a nutshell, you could consider splitting your uploads in the proposed 5 
parts. You could run the reuse referring to your super-archive if that means a 
rue-load to your fossology server.

So, did this summarise it well?

Regarding multi language: yes I agree could be the reason, possible, if the 
encoding is different than UTF-8 (windows ISO latin or ISO 8859 might go, but 
other encodings maybe not).

Kind regards, Michael 

> On 12. May 2021, at 11:29, huangt...@hotmail.com wrote:
> Hi Michael,
>     Thanks for your suggestion.
>     Regarding to edit the file in question to see what the problem is, I did 
> get the same suggestion when I searched the Google using the keyword
> "SAXException: [word/document.xml line 2]" yesterday. That post said that I 
> could treat the .docx file as a zipped file, and unzipped it to find what the 
> error was.
> After extracting the "document.xml" file from the corrupted report, line 2 
> was composed of extremely long characters ended with unknown symbol. By 
> deleting the
> unknown symbol, I could view the report now.
>     Regarding to your finding about the generated SPDX file with 13 mio 
> lines, it is true. The uploaded file was a tar file composed of more than 
> 100,000, and that's
> the reason why the SPDX file contained huge amount of lines.
>     Your observation also leads a question in my mind. What should be the 
> suitable upload to be scanned by the Fossology tool?
>     The uploaded file was the whole source codes of my project, and it was 
> composed of the following modules.
> 1.  SDK from the vendor.
> 2.  Linux kernel.
> 3.  WEB GUI design.
> 4.  BSP.
> 5.  Our applications in C and shell scripts.
>     The main purpose of scanning our codes by Fossology is for OSS 
> compliance. Since the SDK from vendor is designed by themselves, those codes
> shouldn't be the OSS. I am wondering whether I should remove the SDK module 
> from the tar file before uploading to the scheduler.
>     By the way, the WEB GUI design supports multi-language. Since some of the 
> symbols in the web page files might use the character sets not being
> recognized by the Fossology tool, I am wondering whether this the reason why 
> I failed to open the generated report.
>     If the above two assumptions are two, does it mean that we need to 
> manually filter out some "inappropriate" files before scanning the tar file?
>     Thanks.
>          Todd

Links: You receive all messages sent to this group.
View/Reply Online (#3453): https://lists.fossology.org/g/fossology/message/3453
Mute This Topic: https://lists.fossology.org/mt/82741191/21656
Group Owner: fossology+ow...@lists.fossology.org
Unsubscribe: https://lists.fossology.org/g/fossology/unsub 

Reply via email to